Methods for identifying risk of breast cancer and treatments thereof

ABSTRACT

Provided herein are methods for identifying risk of breast cancer in a subject and/or a subject at risk of breast cancer, reagents and kits for carrying out the methods, methods for identifying candidate therapeutics for treating breast cancer, and therapeutic methods for treating breast cancer in a subject. These embodiments are based upon an analysis of polymorphic variations in nucleotide sequences within the human genome.

RELATED PATENT APPLICATIONS

This patent application claims the benefit of provisional patentapplication No. 60/429,136 filed Nov. 25, 2002 and provisional patentapplication No. 60/490,234 filed Jul. 24, 2003, having attorney docketnumber 524593004100 and 524593004101, respectively. Each of theseprovisional patent applications names Richard B. Roth et al. asinventors and is hereby incorporated herein by reference in itsentirety, including all drawings and cited publications and documents.

FIELD OF THE INVENTION

The invention relates to genetic methods for identifying risk of breastcancer and treatments that specifically target the disease.

BACKGROUND

Breast cancer is the third most common cancer, and the most commoncancer in women, as well as a cause of disability, psychological trauma,and economic loss. Breast cancer is the second most common cause ofcancer death in women in the United States, in particular for womenbetween the ages of 15 and 54, and the leading cause of cancer-relateddeath (Forbes, Seminars in Oncology, vol.24(1), Suppl 1, 1997:pp.S1-20-S1-35). Indirect effects of the disease also contribute to themortality from breast cancer including consequences of advanced disease,such as metastases to the bone or brain. Complications arising from bonemarrow suppression, radiation fibrosis and neutropenic sepsis,collateral effects from therapeutic interventions, such as surgery,radiation, chemotherapy, or bone marrow transplantation-also contributeto the morbidity and mortality from this disease.

While the pathogenesis of breast cancer is unclear, transformation ofnormal breast epithelium to a malignant phenotype may be the result ofgenetic factors, especially in women under thirty (Miki, et al.,Science, 266: 66-71 (1994)). However, it is likely that other,non-genetic factors also have a significant effect on the etiology ofthe disease. Regardless of its origin, breast cancer morbidity increasessignificantly if it is not detected early in its progression. Thus,considerable efforts have focused on the elucidation of early cellularevents surrounding transformation in breast tissue. Such efforts haveled to the identification of several potential breast cancer markers.For example, alleles of the BRCA1 and BRCA2 genes have been linked tohereditary and early-onset breast cancer (Wooster, et al., Science, 265:2088-2090 (1994)). However, BRCA1 is limited as a cancer marker becauseBRCA1 mutations fail to account for the majority of breast cancers(Ford, et al., British J. Cancer, 72: 805-812 (1995)). Similarly, theBRCA2 gene, which has been linked to forms of hereditary breast cancer,accounts for only a small portion of total breast cancer cases.

SUMMARY

It has been discovered that certain polymorphic variations in humangenomic DNA are associated with the occurrence of breast cancer. Thus,featured herein are methods for identifying a risk of breast cancer in asubject, which comprises detecting the presence or absence of one ormore of the polymorphic variations described herein in a human nucleicacid sample. Also featured herein are nucleic acids that include one ormore polymorphic variations associated with the occurrence of breastcancer, as well as polypeptides encoded by these nucleic acids. Further,provided is a method for identifying a subject at risk of breast cancerand then prescribing to the subject a breast cancer detection procedure,prevention procedure and/or a treatment procedure. In addition, providedare methods for identifying candidate therapeutic molecules for treatingbreast cancer and related disorders, as well as methods for treatingbreast cancer in a subject by diagnosing breast cancer in the subjectand treating the subject with a suitable treatment, such asadministering a therapeutic molecule.

Also provided are compositions comprising a breast cancer cell and/or anucleic acid comprising a nucleotide sequence in FIGS. 1A-1B or FIG. 2,or a fragment or substantially identical nucleic acid thereof, with aRNAi, siRNA, antisense DNA or RNA, or ribozyme nucleic acid designedfrom a nucleotide sequence in FIGS. 1A-1B or FIG. 2. In an embodiment,the nucleic acid is designed from a nucleotide sequence in FIGS. 1A-1Bor FIG. 2 that includes one or more breast cancer associated polymorphicvariations, and in some instances, specifically interacts with such anucleotide sequence. Further, provided are arrays of nucleic acids boundto a solid surface, in which one or more nucleic acid molecules of thearray have a nucleotide sequence from FIGS. 1A-1B or FIG. 2, or afragment or substantially identical nucleic acid thereof, or acomplementary nucleic acid of the foregoing. Featured also arecompositions comprising a breast cancer cell and/or a polypeptideencoded by a nucleotide sequence in FIGS. 1A-1B or FIG. 2, with anantibody that specifically binds to the polypeptide. In an embodiment,the antibody specifically binds to an epitope in the polypeptide thatincludes a non-synonymous amino acid modification associated with breastcancer (e.g., results in an amino acid substitution in the encodedpolypeptide associated with breast cancer).

BRIEF DESCRIPTION OF THE FIGURE

FIGS. 1A-1B include information pertaining to the polymorphic variantsassociated with breast cancer identified herein. Public informationpertaining to the polymorphism and the genomic sequence that includesthe polymorphism is indicated. Each genomic sequence identified in FIGS.1A-1B may be accessed at the http addresswww.ncbi.nih.gov/entrez/query.fcgi, for example, by using the SNPreference number, or the contig accession number (i.e., the “sequenceidentification” number in the second column of the table) in conjunctionwith the Current dbSNP build (which is “115” for all of the SNPsdescribed herein). Typically, the genomic sequence accessed by the SNPreference number will include about 1 to about 200 nucleotides flankingthe polymorphic site, and the genomic sequence accessed by the ContigIdentification number will include a greater number of nucleotidesflanking the polymorphic site, including sequences for nearby genes.Each “Contig Position” listed in FIGS. 1A-1B corresponds to a nucleotideposition set forth in the contig sequence, and designates thepolymorphic site corresponding to the SNP reference number. Sequencescontaining the polymorphisms also may be referenced by the “SequenceIdentification” set forth in FIGS. 1A-1B. The “Sequence Identification”corresponds to cDNA sequence that encodes associated polypeptide(s), or“target polypeptide(s)”, (which are also identified by their locus nameand locus ID. number in FIGS. 1A-1B). The position of the SNP within ornear the cDNA sequence (e.g., intronic, exonic, intergenic) is providedin the “Sequence Position” column of FIGS. 1A-1BA-1B when it is knownand some SNPs fall within two genes or between two genes. Also, theparticular allele associated with breast cancer is specified in FIGS.1A-1B. All nucleotide sequences referenced and accessed by theparameters set forth in FIGS. 1A-1BA-1B and FIG. 2 are incorporatedherein by reference.

FIG. 2 contains genomic sequence information for three polymorphism thatdo not have available reference numbers or sequence identifiers (e.g.,SNP reference numbers or contig accession numbers). The sequenceprovided represents genomic sequence immediately adjacent to thepolymorphism of interest. Each sequence provided represents a genomicsequence immediately adjacent to the polymorphism of interest. Thefollowing nucleotide representations are used throughout FIG. 2: “A” or“a” is adenosine, adenine, or adenylic acid; “C” or “c” is cytidine,cytosine, or cytidylic acid; “G” or “g” is guanosine, guanine, orguanylic acid; “T” or “t” is thymidine, thymine, or thymidylic acid; and“I” or “i” is inosine, hypoxanthine, or inosinic acid. Exons areindicated in italicized lower case type, introns are depicted in normaltext lower case type, and polymorphic sites are depicted in bold uppercase type. SNPs are designated by the following convention: “R”represents A or G, “M” represents A or C; “W” represents A or T; “Y”represents C or T; “S” represents C or G; “K” represents G. or T; “V”represents A, C or G; “H” represents A, C, or T; “D” represents A, G, orT; “B”. represents C, G, or T; and “N” represents A, G, C, or T.

DETAILED DESCRIPTION

It has been discovered that the polymorphic variants in FIGS. 1A-1B areassociated with occurrence of breast cancer in subjects. Thus, detectinggenetic determinants associated with an increased risk of breast canceroccurrence can lead to early identification of a predisposition tobreast cancer and early prescription of preventative measures. Also,associating the polymorphic variants with breast cancer has provided newtargets for diagnosing breast cancer and screening molecules useful intreatments of breast cancer.

Breast Cancer and Sample Selection

Breast cancer is typically described as the uncontrolled growth ofmalignant breast tissue. Breast cancers arise most commonly in thelining of the milk ducts of the breast (ductal carcinoma), or in thelobules where breast milk is produced (lobular carcinoma). Other formsof breast cancer include Inflammatory Breast Cancer and Recurrent BreastCancer. Inflammatory breast cancer is a rare, but very serious,aggressive type of breast cancer. The breast may look red and feel warmwith ridges, welts, or hives on the breast; or the skin may lookwrinkled. It is sometimes misdiagnosed as a simple infection. Recurrentdisease means that the cancer has come back after it has been treated.It may come back in the breast, in the soft tissues of the chest (thechest wall), or in another part of the body.

As used herein, the term “breast cancer” refers to a conditioncharacterized by anomalous rapid proliferation of abnormal cells in oneor both breasts of a subject. The abnormal cells often are referred toas “neoplastic cells,” which are transformed cells that can form a solidtumor. The term “tumor” refers to an abnormal mass or population ofcells (i.e. two or more cells) that result from excessive or abnormalcell division, whether malignant or benign, and pre-cancerous andcancerous cells. Malignant tumors are distinguished from benign growthsor tumors in that, in addition to uncontrolled cellular proliferation,they can invade surrounding tissues and can metastasize. In breastcancer, neoplastic cells may be identified in one or both breasts onlyand not in another tissue or organ, in one or both breasts and one ormore adjacent tissues or organs (e.g. lymph node), or in a breast andone or more non-adjacent tissues or organs to which the breast cancercells have metastasized.

The term “invasion” as used herein refers to the spread of cancerouscells to adjacent surrounding tissues. The term “metastasis” as usedherein refers to a process in which cancer cells travel from one organor tissue to another non-adjacent organ or tissue. Cancer cells in thebreast(s) can spread to tissues and organs of a subject, and conversely,cancer cells from other organs or tissue can invade or metastasize to abreast. Cancerous cells from the breast(s) may invade or metastasize toany other organ or tissue of the body. Breast cancer cells often invadelymph node cells and/or metastasize to the liver, brain and/or bone andspread cancer in these tissues and organs. Breast cancers can spread toother organs and tissues and cause lung cancer, prostate cancer, coloncancer, ovarian cancer, cervical cancer, gastrointestinal cancer,pancreatic cancer, glioblastoma, bladder cancer, hepatoma, colorectalcancer, uterine cervical cancer, endometrial carcinoma, salivary glandcarcinoma, kidney cancer, vulval cancer, thyroid cancer, hepaticcarcinoma, skin cancer, melanoma, ovarian cancer, neuroblastoma,myeloma, various types of head and neck cancer, acute lymphoblasticleukemia, acute myeloid leukemia, Ewing sarcoma and peripheralneuroepithelioma, and other carcinomas, lymphomas, blastomas, sarcomas,and leukemias.

In an effort to detect breast cancer as early as possible, regularphysical exams and screening mammograms often are prescribed andconducted. A diagnostic mammogram often is performed to evaluate abreast complaint or abnormality detected by physical exam or routinescreening mammography. If an abnormality seen with diagnosticmammography is suspicious, additional breast imaging (with exams such asultrasound) or a biopsy may be ordered. A biopsy followed bypathological (microscopic) analysis is a definitive way to determinewhether a subject has breast cancer. Excised breast cancer samples oftenare subjected to the following analyses: diagnosis of the breast tumorand confirmation of its malignancy; maximum tumor thickness; assessmentof completeness of excision of invasive and in situ components andmicroscopic measurements of the shortest extent of clearance; level ofinvasion; presence and extent of regression; presence and extent ofulceration; histological type and special variants; pre-existing lesion;mitotic rate; vascular invasion; neurotropism; cell type; tumorlymphocyte infiltration; and growth phase.

The stage of a breast cancer can be classified as a range of stages fromStage 0 to Stage IV based on its size and the extent to which it hasspread. The following table summarizes the stages: TABLE A Lymph NodeStage Tumor Size Involvement Metastasis (Spread) I Less than 2 cm No NoII Between 2-5 cm No or in same side of No breast III More than 5 cmYes, on same side of No breast IV Not applicable Not applicable Yes

Stage 0 cancer is a contained cancer that has not spread beyond thebreast ductal system. Fifteen to twenty percent of breast cancersdetected by clinical examinations or testing are in Stage 0 (theearliest form of breast cancer). Two types of Stage 0 cancer are lobularcarcinoma in situ (LCIS) and ductal carcinoma in situ (DCIS). LCISindicates high risk for breast cancer. Many physicians do not classifyLCIS as a malignancy and often encounter LCIS by chance on breast biopsywhile investigating another area of concern. While the microscopicfeatures of LCIS are abnormal and are similar to malignancy, LCIS doesnot behave as a cancer (and therefore is not treated as a cancer). LCISis merely a marker for a significantly increased risk of cancer anywherein the breast. However, bilateral simple mastectomy may be occasionallyperformed if LCIS patients have a strong family history of breastcancer. In DCIS the cancer cells are confined to milk ducts in thebreast and have not spread into the fatty breast tissue or to any otherpart of the body (such as the lymph nodes). DCIS may be detected onmammogram as tiny specks of calcium (known as microcalcifications) 80%of the time. Less commonly DCIS can present itself as a mass withcalcifications (15% of the time); and even less likely as a mass withoutcalcifications (<5% of the time). A breast biopsy is used to confirmDCIS. A standard DCIS treatment is breast-conserving therapy (BCT),which is lumpectomy followed by radiation treatment or mastectomy. Todate, DCIS patients have chosen equally among lumpectomy and mastectomyas their treatment option, though specific cases may sometimes favorlumpectomy over mastectomy or vice versa.

In Stage I, the primary (original) cancer is 2 cm or less in diameterand has not spread to the lymph nodes. In Stage IIA, the primary tumoris between 2 and 5 cm in diameter and has not spread to the lymph nodes.In Stage IIB, the primary tumor is between 2 and 5 cm in diameter andhas spread to the axillary (underarm) lymph nodes; or the primary tumoris over 5 cm and has not spread to the lymph nodes. In Stage IIIA, theprimary breast cancer of any kind that has spread to the axillary(underarm) lymph nodes and to axillary tissues. In Stage IIIB, theprimary breast cancer is any size, has attached itself to the chestwall, and has spread to the pectoral (chest) lymph nodes. In Stage IV,the primary cancer has spread out of the breast to other parts of thebody (such as bone, lung, liver, brain). The treatment of Stage IVbreast cancer focuses on extending survival time and relieving symptoms.

Based in part upon selection criteria set forth above, individualshaving breast cancer can be selected for genetic studies. Also,individuals having no history of cancer or breast cancer often areselected for genetic studies. Other selection criteria can include: atissue or fluid sample is derived from an individual characterized asCaucasian; the sample was derived from an individual of German paternaland maternal descent; the database included relevant phenotypeinformation for the individual; case samples were derived fromindividuals diagnosed with breast cancer; control samples were derivedfrom individuals free of cancer and no family history of breast cancer;and sufficient genomic DNA was extracted from each blood sample for allallelotyping and genotyping reactions performed during the study.Phenotype information included pre- or post-menopausal, familialpredisposition, country or origin of mother and father, diagnosis withbreast cancer (date of primary diagnosis, age of individual as ofprimary diagnosis, grade or stage of development, occurrence ofmetastases, e.g., lymph node metastases, organ metastases), condition ofbody tissue (skin tissue, breast tissue, ovary tissue, peritoneum tissueand myometrium), method of treatment (surgery, chemotherapy, hormonetherapy, radiation therapy).

Provided herein is a set of blood samples and a set of correspondingnucleic acid samples isolated from the blood samples, where the bloodsamples are donated from individuals diagnosed with breast cancer. Thesample set often includes blood samples or nucleic acid samples from 100or more, 150 or more, or 200 or more individuals having breast cancer,and sometimes from 250 or more, 300 or more, 400 or more, or 500 or moreindividuals. The individuals can have parents from any place of origin,and in an embodiment, the set of samples are extracted from individualsof German paternal and German maternal ancestry. The samples in each setmay be selected based upon five or more criteria and/or phenotypes setforth above.

Polymorphic Variants Associated with Breast Cancer

A genetic analysis provided herein linked breast cancer with polymorphicvariant nucleic acid sequences in the human genome. As used herein, theterm “polymorphic site” refers to a region in a nucleic acid at whichtwo or more alternative nucleotide sequences are observed in asignificant number of nucleic acid samples from a population ofindividuals. A polymorphic site may be a nucleotide sequence of two ormore nucleotides, an inserted nucleotide or nucleotide sequence, adeleted nucleotide or nucleotide sequence, or a microsatellite, forexample. A polymorphic site that is two or more nucleotides in lengthmay be 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more, 20 or more,30 or more, 50 or more, 75 or more, 100 or more, 500 or more, or about1000 nucleotides in length, where all or some of the nucleotidesequences differ within the region. A polymorphic site is often onenucleotide in length, which is referred to herein as a “singlenucleotide polymorphism” or a “SNP.”

Where there are two, three, or four alternative nucleotide sequences ata polymorphic site, each nucleotide sequence is referred to as a“polymorphic variant” or “nucleic acid variant.” Where two polymorphicvariants exist, for example, the polymorphic variant represented in aminority of samples from a population is sometimes referred to as a“minor allele” and the polymorphic variant that is more prevalentlyrepresented is sometimes referred to as a “major allele.” Many organismspossess a copy of each chromosome (e.g., humans), and those individualswho possess two major alleles or two minor alleles are often referred toas being “homozygous” with respect to the polymorphism, and thoseindividuals who possess one major allele and one minor allele arenormally referred to as being “heterozygous” with respect to thepolymorphism. Individuals who are homozygous with respect to one alleleare sometimes predisposed to a different phenotype as compared toindividuals who are heterozygous or homozygous with respect to anotherallele.

Furthermore, a genotype or polymorphic variant may be expressed in termsof a “haplotype,” which as used herein refers to two or more polymorphicvariants occurring within genomic DNA in a group of individuals within apopulation. For example, two SNPs may exist within a gene where each SNPposition includes a cytosine variation and an adenine variation. Certainindividuals in a population may carry one allele (heterozygous) or twoalleles (homozygous) having the gene with a cytosine at each SNPposition. As the two cytosines corresponding to each SNP in the genetravel together on one or both alleles in these individuals, theindividuals can be characterized as having a cytosine/cytosine haplotypewith respect to the two SNPs in the gene.

As used herein, the term “phenotype” refers to a trait which can becompared between individuals, such as presence or absence of acondition, a visually observable difference in appearance betweenindividuals, metabolic variations, physiological variations, variationsin the function of biological molecules, and the like. An example of aphenotype is occurrence of breast cancer.

Researchers sometimes report a polymorphic variant in a database withoutdetermining whether the variant is represented in a significant fractionof a population. Because a subset of these reported polymorphic variantsare not represented in a statistically significant portion of thepopulation, some of them are sequencing errors and/or not biologicallyrelevant. Thus, it is often not known whether a reported polymorphicvariant is statistically significant or biologically relevant until thepresence of the variant is detected in a population of individuals andthe frequency of the variant is determined. Methods for detecting apolymorphic variant in a population are described herein, specificallyin Example 2. A polymorphic variant is statistically significant andoften biologically relevant if it is represented in 5% or more of apopulation, sometimes 10% or more, 15% or more, or 20% or more of apopulation, and often 25% or more, 30% or more, 35% or more, 40% ormore, 45% or more, or 50% or more of a population.

A polymorphic variant may be detected on either or both strands of adouble-stranded nucleic acid. Also, a polymorphic variant may be locatedwithin an intron or exon of a gene or within a portion of a regulatoryregion such as a promoter, a 5′ untranslated region (UTR), a 3′ UTR, andin DNA (e.g., genomic DNA (gDNA) and complementary DNA (cDNA)), RNA(e.g., mRNA, tRNA, and rRNA), or a polypeptide. Polymorphic variationsmay or may not result in detectable differences in gene expression,polypeptide structure, or polypeptide function.

In the genetic analysis that associated breast cancer with thepolymorphic variants set forth in FIGS. 1A-1B, samples from individualshaving breast cancer and individuals not having cancer were allelotypedand genotyped. The term “genotyped” as used herein refers to a processfor determining a genotype of one or more individuals, where a“genotype” is a representation of one or more polymorphic variants in apopulation.

Furthermore, a genotype or polymorphic variant may be expressed in termsof a “haplotype,” which as used herein refers to two or more polymorphicvariants occurring within genomic DNA in a group of individuals within apopulation. For example, two SNPs may exist within a gene where each SNPposition includes a cytosine variation and an adenine variation. Certainindividuals in a population may carry one allele (heterozygous) or twoalleles (homozygous) having the gene with a cytosine at each SNPposition. As the two cytosines corresponding to each SNP in the genetravel together on one or both alleles in these individuals, theindividuals can be characterized as having a cytosine/cytosine haplotypewith respect to the two SNPs in the gene.

Additional Polymorphic Variants Associated with Breast Cancer

Also provided is a method for identifying polymorphic variants proximalto an incident, founder polymorphic variant associated with breastcancer. Thus, featured herein are methods for identifying a polymorphicvariation associated with breast cancer that is proximal to an incidentpolymorphic variation associated with breast cancer, which comprisesidentifying a polymorphic variant proximal to the incident polymorphicvariant associated with breast cancer, where the incident polymorphicvariant is in a nucleotide sequence set forth in FIGS. 1A-1B. Thenucleotide sequence often comprises a polynucleotide sequence selectedfrom the group consisting of (a) a nucleotide sequence set forth inFIGS. 1A-1B or FIG. 2; (b) a nucleotide sequence which encodes apolypeptide having an amino acid sequence encoded by a nucleotidesequence in FIGS. 1A-1B or FIG. 2; (c) a nucleotide sequence whichencodes a polypeptide that is 90% or more identical to an amino acidsequence encoded by a nucleotide sequence in FIGS. 1A-1B or FIG. 2 or anucleotide sequence about 90% or more identical to the nucleotidesequence set forth in FIGS. 1A-1B or FIG. 2; and (d) a fragment of anucleotide sequence of (a), (b), or (c), often a fragment that includesa polymorphic site associated with breast cancer. The presence orabsence of an association of the proximal polymorphic variant withbreast cancer then is determined using a known association method, suchas a method described in the Examples hereafter. In an embodiment, theincident polymorphic variant is described in FIGS. 1A-1B. In anotherembodiment, the proximal polymorphic variant identified sometimes is apublicly disclosed polymorphic variant, which for example, sometimes ispublished in a publicly available database. In other embodiments, thepolymorphic variant identified is not publicly disclosed and isdiscovered using a known method, including, but not limited to,sequencing a region surrounding the incident polymorphic variant in agroup of nucleic samples. Thus, multiple polymorphic variants proximalto an incident polymorphic variant are associated with breast cancerusing this method.

The proximal polymorphic variant often is identified in a regionsurrounding the incident polymorphic variant. In certain embodiments,this surrounding region is about 50 kb flanking the first polymorphicvariant (e.g. about 50 kb 5′ of the first polymorphic variant and about50 kb 3′ of the first polymorphic variant), and the region sometimes iscomposed of shorter flanking sequences, such as flanking sequences ofabout 40 kb, about 30 kb, about 25 kb, about 20 kb, about 15 kb, about10 kb, about 7 kb, about 5 kb, or about 2 kb 5′ and 3′ of the incidentpolymorphic variant. In other embodiments, the region is composed oflonger flanking sequences, such as flanking sequences of about 55 kb,about 60 kb, about 65 kb, about 70 kb, about 75 kb, about 80 kb, about85 kb, about 90 kb, about 95 kb, or about 100 kb 5′ and 3′ of theincident polymorphic variant.

In certain embodiments, polymorphic variants associated with breastcancer are identified iteratively. For example, a first proximalpolymorphic variant is associated with breast cancer using the methodsdescribed above and then another polymorphic variant proximal to thefirst proximal polymorphic variant is identified (e.g., publiclydisclosed or discovered) and the presence or absence of an associationof one or more other polymorphic variants proximal to the first proximalpolymorphic variant with breast cancer is determined.

The methods described herein are useful for identifying or discoveringadditional polymorphic variants that may be used to further characterizea gene, region or loci associated with a condition, a disease (e.g.,breast cancer), or a disorder. For example, allelotyping or genotypingdata from the additional polymorphic variants may be used to identify afunctional mutation or a region of linkage disequilibrium.

In certain embodiments, polymorphic variants identified or discoveredwithin a region comprising the first polymorphic variant associated withbreast cancer are genotyped using the genetic methods and sampleselection techniques described herein, and it can be determined whetherthose polymorphic variants are in linkage disequilibrium with the firstpolymorphic variant. The size of the region in linkage disequilibriumwith the first polymorphic variant also can be assessed using thesegenotyping methods. Thus, provided herein are methods for determiningwhether a polymorphic variant is in linkage disequilibrium with a firstpolymorphic variant associated with breast cancer, and such informationcan be used in prognosis methods described herein.

Isolated Nucleic Acids

Featured herein are isolated nucleic acid variants depicted in FIGS.1A-1B, and substantially identical nucleic acids thereof. A nucleic acidvariance may be represented on one or both strands in a double-strandednucleic acid or on one chromosomal complement (heterozygous) or bothchromosomal complements (homozygous)).

As used herein, the term “nucleic acid” includes DNA molecules (e.g., acomplementary DNA (cDNA) and genomic DNA (gDNA)) and RNA molecules(e.g., mRNA, rRNA, siRNA and tRNA) and analogs of DNA or RNA, forexample, by use of nucleotide analogs. The nucleic acid molecule can besingle-stranded and it is often double-stranded. The term “isolated orpurified nucleic acid” refers to nucleic acids that are separated fromother nucleic acids present in the natural source of the nucleic acid.For example, with regard to genomic DNA, the term “isolated” includesnucleic acids which are separated from the chromosome with which thegenomic DNA is naturally associated. An “isolated” nucleic acid is oftenfree of sequences which naturally flank the nucleic acid (i.e.,sequences located at the 5′ and/or 3′ ends of the nucleic acid) in thegenomic DNA of the organism from which the nucleic acid is derived. Forexample, in various embodiments, the isolated nucleic acid molecule cancontain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kbof 5′ and/or 3′ nucleotide sequences which flank the nucleic acidmolecule in genomic DNA of the cell from which the nucleic acid isderived. Moreover, an “isolated” nucleic acid molecule, such as a cDNAmolecule, can be substantially free of other cellular material, orculture medium when produced by recombinant techniques, or substantiallyfree of chemical precursors or other chemicals when chemicallysynthesized. As used herein, the term “gene” refers to a nucleotidesequence that encodes a polypeptide.

Also included herein are nucleic acid fragments. These fragments ofteninclude a nucleotide sequence identical to a nucleotide sequence inFIGS. 1A-1B or FIG. 2, a nucleotide sequence substantially identical toa nucleotide sequence in FIGS. 1A-1B or FIG. 2, or a nucleotide sequencethat is complementary to the foregoing. The nucleic acid fragment may beidentical, substantially identical or homologous to a nucleotidesequence in an exon or an intron in a nucleotide sequence of FIGS. 1A-1Bor FIG. 2, and may encode a domain or part of a domain of a polypeptide.Sometimes, the fragment will comprises one or more of the polymorphicvariations described herein as being associated with breast cancer. Thenucleic acid fragment is often 50, 100, or 200 or fewer base pairs inlength, and is sometimes about 300, 400, 500, 600, 700, 800, 900, 1000,1100, 1200, 1300, 1400, 1500, 2000, 3000, 4000, 5000, 10000, 15000, or20000 base pairs in length. A nucleic acid fragment that iscomplementary to a nucleotide sequence identical or substantiallyidentical to a nucleotide sequence in FIGS. 1A-1B or FIG. 2 andhybridizes to such a nucleotide sequence under stringent conditions isoften referred to as a “probe.” Nucleic acid fragments often include oneor more polymorphic sites, or sometimes have an end that is adjacent toa polymorphic site as described hereafter.

An example of a nucleic acid fragment is an oligonucleotide. As usedherein, the term “oligonucleotide” refers to a nucleic acid comprisingabout 8 to about 50 covalently linked nucleotides, often comprising fromabout 8 to about 35 nucleotides, and more often from about 10 to about25 nucleotides. The backbone and nucleotides within an oligonucleotidemay be the same as those of naturally occurring nucleic acids, oranalogs or derivatives of naturally occurring nucleic acids, providedthat oligonucleotides having such analogs or derivatives retain theability to hybridize specifically to a nucleic acid comprising atargeted polymorphism. Oligonucleotides described herein may be used ashybridization probes or as components of prognostic or diagnosticassays, for example, as described herein.

Oligonucleotides are typically synthesized using standard methods andequipment, such as the ABI™3900 High Throughput DNA Synthesizer and theEXPEDITE™ 8909 Nucleic Acid Synthesizer, both of which are availablefrom Applied Biosystems (Foster City, Calif.). Analogs and derivativesare exemplified in U.S. Pat. Nos. 4,469,863; 5,536,821; 5,541,306;5,637,683; 5,637,684; 5,700,922; 5,717,083; 5,719,262; 5,739,308;5,773,601; 5,886,165; 5,929,226; 5,977,296; 6,140,482; WO 00/56746; WO01/14398, and related publications. Methods for synthesizingoligonucleotides comprising such analogs or derivatives are disclosed,for example, in the patent publications cited above and in U.S. Pat.Nos. 5,614,622; 5,739,314; 5,955,599; 5,962,674; 6,117,992; in WO00/75372; and in related publications.

Oligonucleotides may also be linked to a second moiety. The secondmoiety may be an additional nucleotide sequence such as a tail sequence(e.g., a polyadenosine tail), an adapter sequence (e.g., phage M13universal tail sequence), and others. Alternatively, the second moietymay be a non-nucleotide moiety such as a moiety which facilitateslinkage to a solid support or a label to facilitate detection of theoligonucleotide. Such labels include, without limitation, a radioactivelabel, a fluorescent label, a chemiluminescent label, a paramagneticlabel, and the like. The second moiety may be attached to any positionof the oligonucleotide, provided the oligonucleotide can hybridize tothe nucleic acid comprising the polymorphism.

Uses for Nucleic Acid Sequences

Nucleic acid coding sequences depicted in FIGS. 1A-1B may be used fordiagnostic purposes for detection and control of polypeptide expression.Also, included herein are oligonucleotide sequences such as antisenseRNA, small-interfering RNA (siRNA) and DNA molecules and ribozymes thatfunction to inhibit translation of a polypeptide. Antisense techniquesand RNA interference techniques are known in the art and are describedherein.

Ribozymes are enzymatic RNA molecules capable of catalyzing the specificcleavage of RNA. The mechanism of ribozyme action involves sequencespecific hybridization of the ribozyme molecule to complementary targetRNA, followed by a endonucleolytic cleavage. Ribozymes may be engineeredhammerhead motif ribozyme molecules that specifically and efficientlycatalyze endonucleolytic cleavage of RNA sequences corresponding to orcomplementary to the nucleotide sequences set forth in FIGS. 1A-1B.Specific ribozyme cleavage sites within any potential RNA target areinitially identified by scanning the target molecule for ribozymecleavage sites which include the following sequences, GUA, GUU, and GUC.Once identified, short RNA sequences of between fifteen (15) and twenty(20) ribonucleotides corresponding to the region of the target genecontaining the cleavage site may be evaluated for predicted structuralfeatures such as secondary structure that may render the oligonucleotidesequence unsuitable. The suitability of candidate targets may also beevaluated by testing their accessibility to hybridization withcomplementary oligonucleotides, using ribonuclease protection assays.

Antisense RNA and DNA molecules, siRNA and ribozymes may be prepared byany method known in the art for the synthesis of RNA molecules. Theseinclude techniques for chemically synthesizing oligodeoxyribonucleotideswell known in the art such as solid phase phosphoramidite chemicalsynthesis. Alternatively, RNA molecules may be generated by in vitro andin vivo transcription of DNA sequences encoding the antisense RNAmolecule. Such DNA sequences may be incorporated into a wide variety ofvectors which incorporate suitable RNA polymerase promoters such as theT7 or SP6 polymerase promoters. Alternatively, antisense cDNA constructsthat synthesize antisense RNA constitutively or inducibly, depending onthe promoter used, can be introduced stably into cell lines.

DNA encoding a polypeptide also may have a number of uses for thediagnosis of diseases, including breast cancer, resulting from aberrantexpression of a target gene described herein. For example, the nucleicacid sequence may be used in hybridization assays of biopsies orautopsies to diagnose abnormalities of expression or function (e.g.,Southern or Northern blot analysis, in situ hybridization assays).

In addition, the expression of a polypeptide during embryonicdevelopment may also be determined using nucleic acid encoding thepolypeptide. As addressed, infra, production of functionally impairedpolypeptide can be the cause of various disease states, such as breastcancer. In situ hybridizations using polynucleotide probes may beemployed to predict problems related to breast cancer. Further, asindicated, infra, administration of human active polypeptide,recombinantly produced as described herein, may be used to treat diseasestates related to functionally impaired polypeptide. Alternatively, genetherapy approaches may be employed to remedy deficiencies of functionalpolypeptide or to replace or compete with dysfunctional polypeptide.

Expression Vectors, Host Cells, and Genetically Engineered Cells

Provided herein are nucleic acid vectors, often expression vectors,which contain a nucleotide sequence set forth in FIGS. 1A-1B or FIG. 2or a substantially identical sequence thereof. As used herein, the term“vector” refers to a nucleic acid molecule capable of transportinganother nucleic acid to which it has been linked and can include aplasmid, cosmid, or viral vector. The vector can be capable ofautonomous replication or it can integrate into a host DNA. Viralvectors may include replication defective retroviruses, adenoviruses andadeno-associated viruses for example.

A vector can include a nucleotide sequence from FIGS. 1A-1B or FIG. 2 ina form suitable for expression of an encoded target polypeptide ortarget nucleic acid in a host cell. A “target polypeptide” is apolypeptide encoded by a nucleotide sequence from FIGS. 1A-1B or FIG. 2or a substantially identical nucleotide sequence thereof. Therecombinant expression vector typically includes one or more regulatorysequences operatively linked to the nucleic acid sequence to beexpressed. The term “regulatory sequence” includes promoters, enhancersand other expression control elements (e.g., polyadenylation signals).Regulatory sequences include those that direct constitutive expressionof a nucleotide sequence, as well as tissue-specific regulatory and/orinducible sequences. The design of the expression vector can depend onsuch factors as the choice of the host cell to be transformed, the levelof expression of polypeptide desired, and the like. Expression vectorscan be introduced into host cells to produce target polypeptides,including fusion polypeptides.

Recombinant expression vectors can be designed for expression of targetpolypeptides in prokaryotic or eukaryotic cells. For example, targetpolypeptides can be expressed in E. coli, insect cells (e.g., usingbaculovirus expression vectors), yeast cells, or mammalian cells.Suitable host cells are discussed further in Goeddel, Gene ExpressionTechnology: Methods in Enzymology 185, Academic Press, San Diego, Calif.(1990). Alternatively, the recombinant expression vector can betranscribed and translated in vitro, for example using T7 promoterregulatory sequences and T7 polymerase.

Expression of polypeptides in prokaryotes is most often carried out inE. coli with vectors containing constitutive or inducible promotersdirecting the expression of either fusion or non-fusion polypeptides.Fusion vectors add a number of amino acids to a polypeptide encodedtherein, usually to the amino terminus of the recombinant polypeptide.Such fusion vectors typically serve three purposes: 1) to increaseexpression of recombinant polypeptide; 2) to increase the solubility ofthe recombinant polypeptide; and 3) to aid in the purification of therecombinant polypeptide by acting as a ligand in affinity purification.Often, a proteolytic cleavage site is introduced at the junction of thefusion moiety and the recombinant polypeptide to enable separation ofthe recombinant polypeptide from the fusion moiety subsequent topurification of the fusion polypeptide. Such enzymes, and their cognaterecognition sequences, include Factor Xa, thrombin and enterokinase.Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc;Smith & Johnson, Gene 67: 31-40. (1988)), pMAL (New England Biolabs,Beverly, Mass.) and pRIT5. (Pharmacia, Piscataway, N.J.) which fuseglutathione S-transferase (GST), maltose E binding polypeptide, orpolypeptide A, respectively, to the target recombinant polypeptide.

Purified fusion polypeptides can be used in screening assays and togenerate antibodies specific for target polypeptides. In a therapeuticembodiment, fusion polypeptide expressed in a retroviral expressionvector is used to infect bone marrow cells that are subsequentlytransplanted into irradiated recipients. The pathology of the subjectrecipient is then examined after sufficient time has passed (e.g., six(6) weeks).

Expressing the polypeptide in host bacteria with an impaired capacity toproteolytically cleave the recombinant polypeptide is often used tomaximize recombinant polypeptide expression (Gottesman, S., GeneExpression Technology: Methods in Enzymology, Academic Press, San Diego,Calif. 185: 119-128 (1990)). Another strategy is to alter the nucleotidesequence of the nucleic acid to be inserted into an expression vector sothat the individual codons for each amino acid are those preferentiallyutilized in E. coli (Wada et al., Nucleic Acids Res. 20: 2111-2118(1992)). Such alteration of nucleotide sequences can be carried out bystandard DNA synthesis techniques.

When used in mammalian cells, the expression vector's control functionsare often provided by viral regulatory elements. For example, commonlyused promoters are derived from polyoma, Adenovirus 2, cytomegalovirusand Simian Virus 40. Recombinant mammalian expression vectors are oftencapable of directing expression of the nucleic acid in a particular celltype (e.g., tissue-specific regulatory elements are used to express thenucleic acid). Non-limiting examples of suitable tissue-specificpromoters include an albumin promoter (liver-specific; Pinkert et al.,Genes Dev. 1: 268-277 (1987)), lymphoid-specific promoters (Calame &Eaton, Adv. Immunol. 43: 235-275 (1988)), promoters of T cell receptors(Winoto & Baltimore, EMBO J. 8: 729-733 (1989)) promoters ofimmunoglobulins (Banerji et al., Cell 33: 729-740 (1983); Queen &Baltimore, Cell 33: 741-748 (1983)), neuron-specific promoters (e.g.,the neurofilament promoter; Byrne & Ruddle, Proc. Natl. Acad. Sci. USA86: 5473-5477 (1989)), pancreas-specific promoters (Edlund et al.,Science 230: 912-916 (1985)), and mammary gland-specific promoters(e.g., milk whey promoter; U.S. Pat. No. 4,873,316 and EuropeanApplication Publication No. 264,166). Developmentally-regulatedpromoters are sometimes utilized, for example, the murine hox promoters(Kessel & Gruss, Science 249: 374-379 (1990)) and the α-fetopolypeptidepromoter (Campes & Tilghman, Genes Dev. 3: 537-546 (1989)).

A nucleic acid from FIGS. 1A-B or FIG. 2 may also be cloned into anexpression vector in an antisense orientation. Regulatory sequences(e.g., viral promoters and/or enhancers) operatively linked to a nucleicacid of FIGS. 1A-1B or FIG. 2 cloned in the antisense orientation can bechosen for directing constitutive, tissue specific or cell type specificexpression of antisense RNA in a variety of cell types. Antisenseexpression vectors can be in the form of a recombinant plasmid, phagemidor attenuated virus. For a discussion of the regulation of geneexpression using antisense genes see, e.g. Weintraub et al., AntisenseRNA as a molecular tool for genetic analysis, Reviews—Trends inGenetics, Vol. 1(1) (1986).

Also provided herein are host cells that include a nucleotide sequencefrom FIGS. 1A-1B or FIG. 2 within a recombinant expression vector or afragment of a nucleotide sequence from FIGS. 1A-1B or FIG. 2 whichfacilitate homologous recombination into a specific site of the hostcell genome. The terms “host cell” and “recombinant host cell” are usedinterchangeably herein. Such terms refer not only to the particularsubject cell but rather also to the progeny or potential progeny of sucha cell. Because certain modifications may occur in succeedinggenerations due to either mutation or environmental influences, suchprogeny may not, in fact, be identical to the parent cell, but are stillincluded within the scope of the term as used herein. A host cell can beany prokaryotic or eukaryotic cell. For example, a target polypeptidecan be expressed in bacterial cells such as E. coli, insect cells, yeastor mammalian cells (such as Chinese hamster ovary cells (CHO) or COScells). Other suitable host cells are known to those skilled in the art.

Vectors can be introduced into host cells via conventionaltransformation or transfection techniques. As used herein, the terms“transformation” and “transfection” are intended to refer to a varietyof art-recognized techniques for introducing foreign nucleic acid (e.g.,DNA) into a host cell, including calcium phosphate or calcium chlorideco-precipitation, transduction/infection, DEAE-dextran-mediatedtransfection, lipofection, or electroporation.

A host cell provided herein can be used to produce (i.e., express) atarget polypeptide. Accordingly, further provided are methods forproducing a target polypeptide using such host cells. In one embodiment,the method includes culturing host cells into which a recombinantexpression vector encoding a target polypeptide has been introduced in asuitable medium such that a target polypeptide is produced. In anotherembodiment, the method further includes isolating a target polypeptidefrom the medium or the host cell.

Also provided are cells or purified preparations of cells which includea transgene from FIGS. 1A-1B or FIG. 2, or which otherwise misexpresstarget polypeptide. Cell preparations can consist of human or non-humancells, e.g., rodent cells, e.g., mouse or rat cells, rabbit cells, orpig cells. In certain embodiments, the cell or cells include a transgenefrom FIGS. 1A-1B or FIG. 2 (e.g., a heterologous form of a gene in FIGS.1A-1B or FIG. 2, such as a human gene expressed in non-human cells). Thetransgene can be misexpressed, e.g., overexpressed or underexpressed. Inother embodiments, the cell or cells include a gene which misexpress anendogenous target polypeptide (e.g., expression of a gene is disrupted,also known as a knockout). Such cells can serve as a model for studyingdisorders which are related to mutated or mis-expressed alleles or foruse in drug screening. Also provided are human cells (e.g., ahematopoietic stem cells) transformed with a nucleic acid from FIGS.1A-1B or FIG. 2.

Also provided are cells or a purified preparation thereof (e.g., humancells) in which an endogenous nucleic acid from FIGS. 1A-1B or FIG. 2 isunder the control of a regulatory sequence that does not normallycontrol the expression of the endogenous gene corresponding to thesequence from FIGS. 1A-1B or FIG. 2. The expression characteristics ofan endogenous gene within a cell (e.g., a cell line or microorganism)can be modified by inserting a heterologous DNA regulatory element intothe genome of the cell such that the inserted regulatory element isoperably linked to the corresponding endogenous gene. For example, anendogenous corresponding gene (e.g., a gene which is “transcriptionallysilent,” not normally expressed, or expressed only at very low levels)may be activated by inserting a regulatory element which is capable ofpromoting the expression of a normally expressed gene product in thatcell. Techniques such as targeted homologous recombinations, can be usedto insert the heterologous DNA as described in, e.g., Chappel, U.S. Pat.No. 5,272,071; WO 91/06667, published on May 16, 1991.

Transgenic Animals

Non-human transgenic animals that express a heterologous targetpolypeptide (e.g., expressed from a nucleic acid from FIGS. 1A-1B orFIG. 2 or substantially identical sequence thereof) can be generated.Such animals are useful for studying the function and/or activity of atarget polypeptide and for identifying and/or evaluating modulators ofthe activity of nucleic acids from FIGS. 1A-1B or FIG. 2 and encodedpolypeptides. As used herein, a “transgenic animal” is a non-humananimal such as a mammal (e.g., a non-human primate such as chimpanzee,baboon, or macaque; an ungulate such as an equine, bovine, or caprine;or a rodent such as a rat, a mouse, or an Israeli sand rat), a bird(e.g., a chicken or a turkey), an amphibian (e.g., a frog, salamander,or newt), or an insect (e.g., Drosophila melanogaster), in which one ormore of the cells of the animal includes a transgene. A transgene isexogenous DNA or a rearrangement (e.g., a deletion of endogenouschromosomal DNA) that is often integrated into or occurs in the genomeof cells in a transgenic animal. A transgene can direct expression of anencoded gene product in one or more cell types or tissues of thetransgenic animal, and other transgenes can reduce expression (e.g., aknockout). Thus, a transgenic animal can be one in which an endogenousnucleic acid homologous to a nucleic acid from FIGS. 1A-1B or FIG. 2 hasbeen altered by homologous recombination between the endogenous gene andan exogenous DNA molecule introduced into a cell of the animal (e.g., anembryonic cell of the animal) prior to development of the animal.

Intronic sequences and polyadenylation signals can also be included inthe transgene to increase expression efficiency of the transgene. One ormore tissue-specific regulatory sequences can be operably linked to anucleotide sequence of FIGS. 1A-1B or FIG. 2 to direct expression of anencoded polypeptide to particular cells. A transgenic founder animal canbe identified based upon the presence of a nucleotide sequence fromFIGS. 1A-1B or FIG. 2 in its genome and/or expression of encoded mRNA intissues or cells of the animals. A transgenic founder animal can then beused to breed additional animals carrying the transgene. Moreover,transgenic animals carrying a nucleotide sequence from FIGS. 1A-1B orFIG. 2 can further be bred to other transgenic animals carrying othertransgenes.

Target polypeptides can be expressed in transgenic animals or plants byintroducing, for example, a nucleic acid from FIGS. 1A-1B or FIG. 2 intothe genome of an animal that encodes the target polypeptide. In certainembodiments the nucleic acid is placed under the control of a tissuespecific promoter, e.g., a milk or egg specific promoter, and recoveredfrom the milk or eggs produced by the animal. Also included is apopulation of cells from a transgenic animal.

Target Polypeptides

Also featured herein are isolated target polypeptides, which are encodedby a nucleotide sequence from FIGS. 1A-1B or FIG. 2 or a substantiallyidentical nucleotide sequence thereof. Such polypeptides sometimes areproteins or peptides. An “isolated” or “purified” polypeptide issubstantially free of cellular material or other contaminating proteinsfrom the cell or tissue source from which the protein is derived, orsubstantially free from chemical precursors or other chemicals whenchemically synthesized. In one embodiment, the language “substantiallyfree” means preparation of a target polypeptide having less than about30%, 20%, 10% and more preferably 5% (by dry weight), of non-targetpolypeptide (also referred to herein as a “contaminating protein”), orof chemical precursors or non-target chemicals. When the targetpolypeptide or a biologically active portion thereof is recombinantlyproduced, it is also preferably substantially free of culture medium,specifically, where culture medium represents less than about 20%,sometimes less than about 10%, and often less than about 5% of thevolume of the polypeptide preparation. Isolated or purified targetpolypeptide preparations are sometimes 0.01 milligrams or more or 0.1milligrams or more, and often 1.0 milligrams or more and 10 milligramsor more in dry weight.

Further included herein are target polypeptide fragments, whichsometimes are referred to as peptides. The polypeptide fragment may be adomain or part of a domain of a target polypeptide. The polypeptidefragment may have increased, decreased or unexpected biologicalactivity. The polypeptide fragment is often 50 or fewer, 100 or fewer,or 200 or fewer amino acids in length, and is sometimes 300, 400, 500,600, 700, or 900 or fewer amino acids in length.

Substantially identical target polypeptides may depart from the aminoacid sequences of target polypeptides in different manners. For example,conservative amino acid modifications may be introduced at one or morepositions in the amino acid sequences of target polypeptides. A“conservative amino acid substitution” is one in which the amino acid isreplaced by another amino acid having a similar structure and/orchemical function. Families of amino acid residues having similarstructures and functions are well known. These families include aminoacids with basic side chains (e.g., lysine, arginine, histidine), acidicside chains (e.g., aspartic acid, glutamic acid), uncharged polar sidechains (e.g., glycine, asparagine, glutamine, serine, threonine,tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine,leucine, isoleucine, proline, phenylalanine, methionine, tryptophan),beta-branched side chains (e.g., threonine, valine, isoleucine) andaromatic side chains (e.g., tyrosine, phenylalanine, tryptophan,histidine). Also, essential and non-essential amino acids may bereplaced. A “non-essential” amino acid is one that can be alteredwithout abolishing or substantially altering the biological function ofa target polypeptide, whereas altering an “essential” amino acidabolishes or substantially alters the biological function of a targetpolypeptide. Amino acids that are conserved among target polypeptidesare typically essential amino acids.

Also, target polypeptides may exist as chimeric or fusion polypeptides.As used herein, a target “chimeric polypeptide” or target “fusionpolypeptide” includes a target polypeptide linked to a non-targetpolypeptide. A “non-target polypeptide” refers to a polypeptide havingan amino acid sequence corresponding to a polypeptide which is notsubstantially identical to the target polypeptide, which includes, forexample, a polypeptide that is different from the target polypeptide andderived from the same or a different organism. The target polypeptide inthe fusion polypeptide can correspond to an entire or nearly entiretarget polypeptide or a fragment thereof. The non-target polypeptide canbe fused to the N-terminus or C-terminus of the target polypeptide.

Fusion polypeptides can include a moiety having high affinity for aligand. For example, the fusion polypeptide can be a GST-target fusionpolypeptide in which the target sequences are fused to the C-terminus ofthe GST sequences, or a polyhistidine-target fusion polypeptide in whichthe target polypeptide is fused at the N- or C-terminus to a string ofhistidine residues. Such fusion polypeptides can facilitate purificationof recombinant target polypeptide. Expression vectors are commerciallyavailable that already encode a fusion moiety (e.g., a GST polypeptide),and a nucleotide sequence from FIGS. 1A-1B or FIG. 2, or a substantiallyidentical nucleotide sequence thereof, can be cloned into an expressionvector such that the fusion moiety is linked in-frame to the targetpolypeptide. Further, the fusion polypeptide can be a target polypeptidecontaining a heterologous signal sequence at its N-terminus. In certainhost cells (e.g., mammalian host cells), expression, secretion, cellularinternalization, and cellular localization of a target polypeptide canbe increased through use of a heterologous signal sequence. Fusionpolypeptides can also include all or a part of a serum polypeptide(e.g., an IgG constant region or human serum albumin).

Target polypeptides can be incorporated into pharmaceutical compositionsand administered to a subject in vivo. Administration of these targetpolypeptides can be used to affect the bioavailability of a substrate ofthe target polypeptide and may effectively increase target polypeptidebiological activity in a cell. Target fusion polypeptides may be usefultherapeutically for the treatment of disorders caused by, for example,(i) aberrant modification or mutation of a gene encoding a targetpolypeptide; (ii) mis-regulation of the gene encoding the targetpolypeptide; and (iii) aberrant post-translational modification of atarget polypeptide. Also, target polypeptides can be used as immunogensto produce anti-target antibodies in a subject, to purify targetpolypeptide ligands or binding partners, and in screening assays toidentify molecules which inhibit or enhance the interaction of a targetpolypeptide with a substrate.

In addition, polypeptides can be chemically synthesized using techniquesknown in the art (See, e.g., Creighton, 1983 Proteins. New York, N.Y.:W. H. Freeman and Company; and Hunkapiller et al., (1984) Nature July12-18;310(5973):105-11). For example, a relative short polypeptidefragment can be synthesized by use of a peptide synthesizer.Furthermore, if desired, nonclassical amino acids or chemical amino acidanalogs can be introduced as a substitution or addition into thefragment sequence. Non-classical amino acids include, but are notlimited to, to the D-isomers of the common amino acids,2,4-diaminobutyric acid, a-amino isobutyric acid, 4-aminobutyric acid,Abu, 2-amino butyric acid, g-Abu, e-Ahx, 6-amino hexanoic acid, Aib,2-amino isobutyric acid, 3-amino propionic acid, ornithine, norleucine,norvaline, hydroxyproline, sarcosine, citrulline, homocitrulline,cysteic acid, t-butylglycine, t-butylalanine, phenylglycine,cyclohexylalanine, b-alanine, fluoroamino acids, designer amino acidssuch as b-methyl amino acids, Ca-methyl amino acids, Na-methyl aminoacids, and amino acid analogs in general. Furthermore, the amino acidcan be D (dextrorotary) or L (levorotary).

Also included are polypeptide fragments which are differentiallymodified during or after translation, e.g., by glycosylation,acetylation, phosphorylation, amidation, derivatization by knownprotecting/blocking groups, proteolytic cleavage, linkage to an antibodymolecule or other cellular ligand, etc. Any of numerous chemicalmodifications may be carried out by known techniques, including but notlimited, to specific chemical cleavage by cyanogen bromide, trypsin,chymotrypsin, papain, V8 protease, NaBH4; acetylation, formylation,oxidation, reduction; metabolic synthesis in the presence oftunicamycin; etc.

Additional post-translational modifications include, for example,N-linked or O-linked carbohydrate chains, processing of N-terminal orC-terminal ends), attachment of chemical moieties to the amino acidbackbone, chemical modifications of N-linked or O-linked carbohydratechains, and addition or deletion of an N-terminal methionine residue asa result of procaryotic host cell expression. The polypeptide fragmentsmay also be modified with a detectable label, such as an enzymatic,fluorescent, isotopic or affinity label to allow for detection andisolation of the polypeptide.

Also provided are chemically modified polypeptide derivatives that mayprovide additional advantages such as increased solubility, stabilityand circulating time of the polypeptide, or decreased immunogenicity.See U.S. Pat. No. 4,179,337. The chemical moieties for derivitizationmay be selected from water soluble polymers such as polyethylene glycol,ethylene glycol/propylene glycol copolymers, carboxymethylcellulose,dextran, polyvinyl alcohol and the like. The polypeptides may bemodified at random positions within the molecule, or at predeterminedpositions within the molecule and may include one, two, three or moreattached chemical moieties.

The polymer may be of any molecular weight, and may be branched orunbranched. For polyethylene glycol, the molecular weight is betweenabout 1 kDa and about 100 kDa (the term “about” indicating that inpreparations of polyethylene glycol, some molecules will weigh more,some less, than the stated molecular weight) for ease in handling andmanufacturing. Other sizes may be used, depending on the desiredtherapeutic profile (e.g., the duration of sustained release desired,the effects, if any on biological activity, the ease in handling, thedegree or lack of antigenicity and other known effects of thepolyethylene glycol to a therapeutic protein or analog).

The polyethylene glycol molecules (or other chemical moieties) should beattached to the polypeptide with consideration of effects on functionalor antigenic domains of the polypeptide. There are a number ofattachment methods available to those skilled in the art, e.g., EP 0 401384, herein incorporated by reference (coupling PEG to G-CSF), see alsoMalik et al. (1992) Exp Hematol. September;20(8):1028-35, reportingpegylation of GM-CSF using tresyl chloride). For example, polyethyleneglycol may be covalently bound through amino acid residues via areactive group, such as, a free amino or carboxyl group. Reactive groupsare those to which an activated polyethylene glycol molecule may bebound. The amino acid residues having a free amino group may includelysine residues and the N-terminal amino acid residues; those having afree carboxyl group may include aspartic acid residues, glutamic acidresidues and the C-terminal amino acid residue. Sulfhydryl groups mayalso be used as a reactive group for attaching the polyethylene glycolmolecules. A polymer sometimes is attached at an amino group, such asattachment at the N-terminus or lysine group.

One may specifically desire proteins chemically modified at theN-terminus. Using polyethylene glycol as an illustration of the presentcomposition, one may select from a variety of polyethylene glycolmolecules (by molecular weight, branching, etc.), the proportion ofpolyethylene glycol molecules to protein (polypeptide) molecules in thereaction mix, the type of pegylation reaction to be performed, and themethod of obtaining the selected N-terminally pegylated protein. Themethod of obtaining the N-terminally pegylated preparation (i.e.,separating this moiety from other monopegylated moieties if necessary)may be by purification of the N-terminally pegylated material from apopulation of pegylated protein molecules. Selective proteins chemicallymodified at the N-terminus may be accomplished by reductive alkylation,which exploits differential reactivity of different types of primaryamino groups (lysine versus the N-terminal) available for derivatizationin a particular protein. Under the appropriate reaction conditions,substantially selective derivatization of the protein at the N-terminuswith a carbonyl group containing polymer is achieved.

Substantially Identical Nucleic Acids and Polypeptides

Nucleotide sequences and polypeptide sequences that are substantiallyidentical to the nucleotide sequences in FIGS. 1A-1B or FIG. 2 and thetarget polypeptide sequences encoded by those nucleotide sequences areincluded herein. The term “substantially identical” as used hereinrefers to two or more nucleic acids or polypeptides sharing one or moreidentical nucleotide sequences or polypeptide sequences, respectively.Included are nucleotide sequences or polypeptide sequences that are 55%,60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% (each often within a 1%, 2%, 3%or 4% variability) or more identical to the nucleotide sequences inFIGS. 1A-1B or FIG. 2 or the encoded target polypeptide amino acidsequences. One test for determining whether two nucleic acids aresubstantially identical is to determine the percent of identicalnucleotide sequences or polypeptide sequences shared between the nucleicacids or polypeptides.

Calculations of sequence identity are often performed as follows.Sequences are aligned for optimal comparison purposes (e.g., gaps can beintroduced in one or both of a first and a second amino acid or nucleicacid sequence for optimal alignment and non-homologous sequences can bedisregarded for comparison purposes). The length of a reference sequencealigned for comparison purposes is sometimes 30% or more, 40% or more,50% or more, often 60% or more, and more often 70%, 80%, 90%, 100% ofthe length of the reference sequence. The nucleotides or amino acids atcorresponding nucleotide or polypeptide positions, respectively, arethen compared among the two sequences. When a position in the firstsequence is occupied by the same nucleotide or amino acid as thecorresponding position in the second sequence, the nucleotides or aminoacids are deemed to be identical at that position. The percent identitybetween the two sequences is a function of the number of identicalpositions shared by the sequences, taking into account the number ofgaps, and the length of each gap, introduced for optimal alignment ofthe two sequences.

Comparison of sequences and determination of percent identity betweentwo sequences can be accomplished using a mathematical algorithm.Percent identity between two amino acid or nucleotide sequences can bedetermined using the algorithm of Meyers & Miller, CABIOS 4: 11-17(1989), which has been incorporated into the ALIGN program (version2.0), using a PAM120 weight residue table, a gap length penalty of 12and a gap penalty of 4. Also, percent identity between two amino acidsequences can be determined using the Needleman & Wunsch, J. Mol. Biol.48: 444-453 (1970) algorithm which has been incorporated into the GAPprogram in the GCG software package (available at the http addresswww.gcg.com), using either a Blossum 62 matrix or a PAM250 matrix, and agap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3,4, 5, or 6. Percent identity between two nucleotide sequences can bedetermined using the GAP program in the GCG software package (availableat http address www.gcg.com), using a NWSgapdna.CMP matrix and a gapweight of 40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or6. A set of parameters often used is a Blossum 62 scoring matrix with agap open penalty of 12, a gap extend penalty of 4, and a frameshift gappenalty of 5.

Another manner for determining if two nucleic acids are substantiallyidentical is to assess whether a polynucleotide homologous to onenucleic acid will hybridize to the other nucleic acid under stringentconditions. As use herein, the term “stringent conditions” refers toconditions for hybridization and washing. Stringent conditions are knownto those skilled in the art and can be found in Current Protocols inMolecular Biology, John Wiley & Sons, N.Y., 6.3.1-6.3.6 (1989). Aqueousand non-aqueous methods are described in that reference and either canbe used. An example of stringent hybridization conditions ishybridization in 6× sodium chloride/sodium citrate (SSC) at about 45°C., followed by one or more washes in 0.2×SSC, 0.1% SDS at 50° C.Another example of stringent hybridization conditions are hybridizationin 6× sodium chloride/sodium citrate (SSC) at about 45° C., followed byone or more washes in 0.2×SSC, 0.1% SDS at 55° C. A further example ofstringent hybridization conditions is hybridization in 6× sodiumchloride/sodium citrate (SSC) at about 45° C., followed by one or morewashes in 0.2×SSC, 0.1% SDS at 60° C. Often, stringent hybridizationconditions are hybridization in 6× sodium chloride/sodium citrate (SSC)at about 45° C., followed by one or more washes in 0.2×SSC, 0.1% SDS at65° C. More often, stringency conditions are 0.5M sodium phosphate, 7%SDS at 65° C., followed by one or more washes at 0.2×SSC, 1% SDS at 65°C.

An example of a substantially identical nucleotide sequence to anucleotide sequence in FIGS. 1A-1B or FIG. 2 is one that has a differentnucleotide sequence but still encodes the same polypeptide sequenceencoded by the nucleotide sequence in FIGS. 1A-1B or FIG. 2. Anotherexample is a nucleotide sequence that encodes a polypeptide having apolypeptide sequence that is more than 70% identical to, sometimes morethan 75%, 80%, or 85% identical to, and often more than 90% and 95%identical to a polypeptide sequence encoded by a nucleotide sequence inFIGS. 1A-1B or FIG. 2.

Nucleotide sequences from FIGS. 1A-1B or FIG. 2 and amino acid sequencesof encoded polypeptides can be used as “query sequences” to perform asearch against public databases to identify other family members orrelated sequences, for example. Such searches can be performed using theNBLAST and XBLAST programs (version 2.0) of Altschul et al., J. Mol.Biol. 215: 403-10 (1990). BLAST nucleotide searches can be performedwith the NBLAST program, score=100, wordlength=12 to obtain nucleotidesequences homologous to nucleotide sequences from FIGS. 1A-1B or FIG. 2.BLAST polypeptide searches can be performed with the XBLAST program,score=50, wordlength=3 to obtain amino acid sequences homologous topolypeptides encoded by the nucleotide sequences of FIGS. 1A-1B or FIG.2. To obtain gapped alignments for comparison purposes, Gapped BLAST canbe utilized as described in Altschul et al., Nucleic Acids Res. 25(17):3389-3402 (1997). When utilizing BLAST and Gapped BLAST programs,default parameters of the respective programs (e.g., XBLAST and NBLAST)can be used (see the http address www.ncbi.nlm.nih.gov).

A nucleic acid that is substantially identical to a nucleotide sequencein FIGS. 1A-1B or FIG. 2 may include polymorphic sites at positionsequivalent to those described herein when the sequences are aligned. Forexample, using the alignment procedures described herein, SNPs in asequence substantially identical to a sequence in FIGS. 1A-1B or FIG. 2can be identified at nucleotide positions that match (i.e., align) withnucleotides at SNP positions in each nucleotide sequence in FIGS. 1A-1Bor FIG. 2. Also, where a polymorphic variation results in an insertionor deletion, insertion or deletion of a nucleotide sequence from areference sequence can change the relative positions of otherpolymorphic sites in the nucleotide sequence.

Substantially identical nucleotide and polypeptide sequences includethose that are naturally occurring, such as allelic variants (samelocus), splice variants, homologs (different locus), and orthologs(different organism) or can be non-naturally occurring. Non-naturallyoccurring variants can be generated by mutagenesis techniques, includingthose applied to polynucleotides, cells, or organisms. The variants cancontain nucleotide substitutions, deletions, inversions and insertions.Variation can occur in either or both the coding and non-coding regions.The variations can produce both conservative and non-conservative aminoacid substitutions (as compared in the encoded product). Orthologs,homologs, allelic variants, and splice variants can be identified usingmethods known in the art. These variants normally comprise a nucleotidesequence encoding a polypeptide that is 50%, about 55% or more, oftenabout 70-75% or more, more often about 80-85% or more, and typicallyabout 90-95% or more identical to the amino acid sequences of targetpolypeptides or a fragment thereof. Such nucleic acid molecules canreadily be identified as being able to hybridize under stringentconditions to a nucleotide sequence in FIGS. 1A-1B or FIG. 2 or afragment of this sequence. Nucleic acid molecules corresponding toorthologs, homologs, and allelic variants of a nucleotide sequence inFIGS. 1A-1B or FIG. 2 can further be identified by mapping the sequenceto the same chromosome or locus as the nucleotide sequence in FIGS.1A-1B or FIG. 2.

Also, substantially identical nucleotide sequences may include codonsthat are altered with respect to the naturally occurring sequence forenhancing expression of a target polypeptide in a particular expressionsystem. For example, the nucleic acid can be one in which one or morecodons are altered, and often 10% or more or 20% or more of the codonsare altered for optimized expression in bacteria (e.g., E. coli.), yeast(e.g., S. cervesiae), human (e.g., 293 cells), insect, or rodent (e.g.,hamster) cells.

Methods for Identifying Subjects at Risk of Breast Cancer and BreastCancer Risk in a Subject

Methods for prognosing and diagnosing breast cancer in subjects areprovided herein. These methods include detecting the presence or absenceof one or more polymorphic variations associated with breast cancer in anucleotide sequence set forth in FIGS. 1A-1B or FIG. 2, or substantiallyidentical sequence thereof, in a sample from a subject, where thepresence of a polymorphic variant described herein is indicative of arisk of breast cancer.

Thus, featured herein is a method for detecting a subject at risk ofbreast cancer or the risk of breast cancer in a subject, which comprisesdetecting the presence or absence of a polymorphic variation associatedwith breast cancer at a polymorphic site in a nucleotide sequence setforth in FIGS. 1A-1B or FIG. 2 in a nucleic acid sample from a subject,where the nucleotide sequence comprises a polynucleotide sequenceselected from the group consisting of: (a) a nucleotide sequence setforth in FIGS. 1A-1B or FIG. 2; (b) a nucleotide sequence which encodesa polypeptide having an amino acid sequence encoded by a nucleotidesequence in FIGS. 1A-1B or FIG. 2; (c) a nucleotide sequence whichencodes a polypeptide that is 90% or more identical to an amino acidsequence encoded by a nucleotide sequence in FIGS. 1A-1B or FIG. 2 or anucleotide sequence about 90% or more identical to the nucleotidesequence set forth in FIGS. 1A-1B or FIG. 2; and (d) a fragment of anucleotide sequence of (a), (b), or (c), often a fragment that includesa polymorphic site associated with breast cancer; whereby the presenceof the polymorphic variation is indicative of a risk of breast cancer inthe subject. In certain embodiments, determining the presence of acombination of two or more polymorphic variants associated with breastcancer in one or more nucleotide sequences in FIGS. 1A-1B or FIG. 2 ofthe sample is determined to identify a subject at risk of breast cancerand/or risk of breast cancer.

Results from prognostic tests may be combined with other test results todiagnose breast cancer. For example, prognostic results may be gathered,a patient sample may be ordered based on a determined predisposition tobreast cancer, the patient sample is analyzed, and the results of theanalysis may be utilized to diagnose breast cancer. Also breast cancerdiagnostic methods can be developed from studies used to generateprognostic/diagnostic methods in which populations are stratified intosubpopulations having different progressions of breast cancer. Inanother embodiment, prognostic results may be gathered; a patient's riskfactors for developing breast cancer analyzed (e.g., age, race, familyhistory, age of first menstrual cycle, age at birth of first child); anda patient sample may be ordered based on a determined predisposition tobreast cancer. In an alternative embodiment, the results frompredisposition analyses described herein may be combined with other testresults indicative of breast cancer, which were previously,concurrently, or subsequently gathered with respect to thepredisposition testing. In these embodiments, the combination of theprognostic test results with other test results can be probative ofbreast cancer, and the combination can be utilized as a breast cancerdiagnostic. The results of any test indicative of breast cancer known inthe art may be combined with the methods described herein. Examples ofsuch tests are mammography (e.g., a more frequent and/or earliermammography regimen may be prescribed); breast biopsy and optionally abiopsy from another tissue; breast ultrasound and optionally anultrasound analysis of another tissue; breast magnetic resonance imaging(MRI) and optionally an MRI analysis of another tissue; electricalimpedance (T-scan) analysis of breast and optionally of another tissue;ductal lavage; nuclear medicine analysis (e.g., scintimammography);BRCA1 and/or BRCA2 sequence analysis results; and thermal imaging of thebreast and optionally of another tissue. Testing may be performed ontissue other than breast to diagnose the occurrence of metastasis (e.g.,testing of the lymph node).

Risk of breast cancer sometimes is expressed as a probability, such asan odds ratio, percentage, or risk factor. The risk is based upon thepresence or absence of one or more polymorphic variants describedherein, and also may be based in part upon phenotypic traits of theindividual being tested. Methods for calculating predispositions basedupon patient data are well known (see, e.g., Agresti, Categorical DataAnalysis, 2nd Ed. 2002. Wiley). Allelotyping and genotyping analyses maybe carried out in populations other than those exemplified herein toenhance the predictive power of the prognostic method. These furtheranalyses are executed in view of the exemplified procedures describedherein, and may be based upon the same polymorphic variations oradditional polymorphic variations. Risk determinations for breast cancerare useful in a variety of applications. In one embodiment, breastcancer risk determinations are used by clinicians to direct appropriatedetection, preventative and treatment procedures to subjects who mostrequire these. In another embodiment, breast cancer risk determinationsare used by health insurers for preparing actuarial tables and forcalculating insurance premiums.

The nucleic acid sample typically is isolated from a biological sampleobtained from a subject. For example, nucleic acid can be isolated fromblood, saliva, sputum, urine, cell scrapings, and biopsy tissue. Thenucleic acid sample can be isolated from a biological sample usingstandard techniques, such as the technique described in Example 2. Asused herein, the term “subject” refers primarily to humans but alsorefers to other mammals such as dogs, cats, and ungulates (e.g., cattle,sheep, and swine). Subjects also include avians (e.g., chickens andturkeys), reptiles, and fish (e.g., salmon), as embodiments describedherein can be adapted to nucleic acid samples isolated from any of theseorganisms. The nucleic acid sample may be isolated from the subject andthen directly utilized in a method for determining the presence of apolymorphic variant, or alternatively, the sample may be isolated andthen stored (e.g., frozen) for a period of time before being subjectedto analysis.

The presence or absence of a polymorphic variant is determined using oneor both chromosomal complements represented in the nucleic acid sample.Determining the presence or absence of a polymorphic variant in bothchromosomal complements represented in a nucleic acid sample from asubject having a copy of each chromosome is useful for determining thezygosity of an individual for the polymorphic variant (i.e., whether theindividual is homozygous or heterozygous for the polymorphic variant).Any oligonucleotide-based diagnostic may be utilized to determinewhether a sample includes the presence or absence of a polymorphicvariant in a sample. For example, primer extension methods, ligasesequence determination methods (e.g., U.S. Pat. Nos. 5,679,524 and5,952,174, and WO 01/27326), mismatch sequence determination methods(e.g., U.S. Pat. Nos. 5,851,770; 5,958,692; 6,110,684; and 6,183,958),microarray sequence determination methods, restriction fragment lengthpolymorphism (RFLP), single strand conformation polymorphism detection(SSCP) (e.g., U.S. Pat. No. 5,891,625 and 6,013,499), PCR-based assays(e.g., TAQMAN® PCR System (Applied Biosystems)), and nucleotidesequencing methods may be used.

Oligonucleotide extension methods typically involve providing a pair ofoligonucleotide primers in a polymerase chain reaction (PCR) or in othernucleic acid amplification methods for the purpose of amplifying aregion from the nucleic acid sample that comprises the polymorphicvariation. One oligonucleotide primer is complementary to a region 3′ ofthe polymorphism and the other is complementary to a region 5′ of thepolymorphism. A PCR primer pair may be used in methods disclosed in U.S.Pat. Nos. 4,683,195; 4,683,202, 4,965,188; 5,656,493; 5,998,143;6,140,054; WO 01/27327; and WO 01/27329 for example. PCR primer pairsmay also be used in any commercially available machines that performPCR, such as any of the GENEAMP® Systems available from AppliedBiosystems. Also, those of ordinary skill in the art will be able todesign oligonucleotide primers based upon a nucleotide sequence setforth in FIGS. 1A-1B without undue experimentation using knowledgereadily available in the art.

Also provided is an extension oligonucleotide that hybridizes to theamplified fragment adjacent to the polymorphic variation. As usedherein, the term “adjacent” refers to the 3′ end of the extensionoligonucleotide being often 1 nucleotide from the 5′ end of thepolymorphic site, and sometimes 2, 3, 4, 5, 6, 7, 8, 9, or 10nucleotides from the 5′ end of the polymorphic site, in the nucleic acidwhen the extension oligonucleotide is hybridized to the nucleic acid.The extension oligonucleotide then is extended by one or morenucleotides, and the number and/or type of nucleotides that are added tothe extension oligonucleotide determine whether the polymorphic variantis present. Oligonucleotide extension methods are disclosed, forexample, in U.S. Pat. Nos. 4,656,127; 4,851,331; 5,679,524; 5,834,189;5,876,934; 5,908,755; 5,912,118; 5,976,802; 5,981,186; 6,004,744;6,013,431; 6,017,702; 6,046,005; 6,087,095; 6,210,891; and WO 01/20039.Oligonucleotide extension methods using mass spectrometry are described,for example, in U.S. Pat. Nos. 5,547,835; 5,605,798; 5,691,141;5,849,542; 5,869,242; 5,928,906; 6,043,031; and 6,194,144, and a methodoften utilized is described herein in Example 2. Multiple extensionoligonucleotides may be utilized in one reaction, which is referred toherein as “multiplexing.”.

A microarray can be utilized for determining whether a polymorphicvariant is present or absent in a nucleic acid sample. A microarray mayinclude any oligonucleotides described herein, and methods for makingand using oligonucleotide microarrays suitable for diagnostic use aredisclosed in U.S. Pat. Nos. 5,492,806; 5,525,464; 5,589,330; 5,695,940;5,849,483; 6,018,041; 6,045,996; 6,136,541; 6,142,681; 6,156,501;6,197,506; 6,223,127; 6,225,625; 6,229,911; 6,239,273; WO 00/52625; WO01/25485; and WO 01/29259. The microarray typically comprises a solidsupport and the oligonucleotides may be linked to this solid support bycovalent bonds or by non-covalent interactions. The oligonucleotides mayalso be linked to the solid support directly or by a spacer molecule. Amicroarray may comprise one or more oligonucleotides complementary to apolymorphic site set forth in FIGS. 1A-1B or FIG. 2.

A kit also may be utilized for determining whether a polymorphic variantis present or absent in a nucleic acid sample. A kit often comprises oneor more pairs of oligonucleotide primers useful for amplifying afragment of a sequence set forth in FIGS. 1A-1B or FIG. 2 or asubstantially identical sequence thereof, where the fragment includes apolymorphic site. The kit sometimes comprises a polymerizing agent, forexample, a thermostable nucleic acid polymerase such as one disclosed inU.S. Pat. Nos. 4,889,818 or 6,077,664. Also, the kit often comprises anelongation oligonucleotide that hybridizes to a nucleic acid set forthin FIGS. 1A-1B in a nucleic acid sample adjacent to the polymorphicsite. Where the kit includes an elongation oligonucleotide, it alsooften comprises chain elongating nucleotides, such as dATP, dTTP, dGTP,dCTP, and dITP, including analogs of dATP, dTTP, dGTP, dCTP and dITP,provided that such analogs are substrates for a thermostable nucleicacid polymerase and can be incorporated into a nucleic acid chainelongated from the extension oligonucleotide. Along with chainelongating nucleotides would be one or more chain terminatingnucleotides such as ddATP, ddTTP, ddGTP, ddCTP, and the like. In anembodiment, the kit comprises one or more oligonucleotide primer pairs,a polymerizing agent, chain elongating nucleotides, at least oneelongation oligonucleotide, and one or more chain terminatingnucleotides. Kits optionally include buffers, vials, microtiter plates,and instructions for use.

An individual identified as being at risk of breast cancer may beheterozygous or homozygous with respect to the allele associated with ahigher risk of breast cancer (e.g., see last column in FIGS. 1A-1B). Asubject homozygous for an allele associated with an increased risk ofbreast cancer is at a comparatively high risk of breast cancer, asubject heterozygous for an allele associated with an increased risk ofbreast cancer is at a comparatively intermediate risk of breast cancer,and a subject homozygous for an allele associated with a decreased riskof breast cancer is at a comparatively low risk of breast cancer. Agenotype may be assessed for a complementary strand, such that thecomplementary nucleotide at a particular position is detected.

Also featured are methods for determining risk of breast cancer and/oridentifying a subject at risk of breast cancer by contacting apolypeptide or protein encoded by a nucleotide sequence in FIGS. 1A-1Bor FIG. 2 from a subject with an antibody that specifically binds to anepitope associated with increased risk of breast cancer in thepolypeptide.

Applications of Prognostic and Diagnostic Results to PharmacogenomicMethods

Pharmacogenomics is a discipline that involves tailoring a treatment fora subject according to the subject's genotype. For example, based uponthe outcome of a prognostic test described herein, a clinician orphysician may target pertinent information and preventative ortherapeutic treatments to a subject who would be benefited by theinformation or treatment and avoid directing such information andtreatments to a subject who would not be benefited (e.g., the treatmenthas no therapeutic effect and/or the subject experiences adverse sideeffects). As therapeutic approaches for breast cancer continue to evolveand improve, the goal of treatments for breast cancer related disordersis to intervene even before clinical signs (e.g., identification of lumpin the breast) first manifest. Thus, genetic markers associated withsusceptibility to breast cancer prove useful for early diagnosis,prevention and treatment of breast cancer.

The following is an example of a pharmacogenomic embodiment. Aparticular treatment regimen can exert a differential effect dependingupon the subject's genotype. Where a candidate therapeutic exhibits asignificant interaction with a major allele and a comparatively weakinteraction with a minor allele (e.g., an order of magnitude or greaterdifference in the interaction), such a therapeutic typically would notbe administered to a subject genotyped as being homozygous for the minorallele, and sometimes not administered to a subject genotyped as beingheterozygous for the minor allele. In another example, where a candidatetherapeutic is not significantly toxic when administered to subjects whoare homozygous for a major allele but is comparatively toxic whenadministered to subjects heterozygous or homozygous for a minor allele,the candidate therapeutic is not typically administered to subjects whoare genotyped as being heterozygous or homozygous with respect to theminor allele.

The methods described herein are applicable to pharmacogenomic methodsfor detecting, preventing, alleviating and/or treating breast cancer.For example, a nucleic acid sample from an individual may be subjectedto a genetic test described herein. Where one or more polymorphicvariations associated with increased risk of breast cancer areidentified in a subject, information for detecting, preventing ortreating breast cancer and/or one or more breast cancer detection,prevention and/or treatment regimens then may be directed to and/orprescribed to that subject.

In certain embodiments, a detection, prevenative and/or treatmentregimen is specifically prescribed and/or administered to individualswho will most benefit from it based upon their risk of developing breastcancer assessed by the methods described herein. Thus, provided aremethods for identifying a subject at risk of breast cancer and thenprescribing a detection, therapeutic or preventative regimen toindividuals identified as being at risk of breast cancer. Thus, certainembodiments are directed to methods for treating breast cancer in asubject, reducing risk of breast cancer in a subject, or early detectionof breast cancer in a subject, which comprise: detecting the presence orabsence of a polymorphic variant associated with breast cancer in anucleotide sequence set forth in FIGS. 1A-1B or FIG. 2 in a nucleic acidsample from a subject, where the nucleotide sequence comprises apolynucleotide sequence selected from the group consisting of: (a) anucleotide sequence set forth in FIGS. 1A-1B or FIG. 2; (b)) anucleotide sequence which encodes a polypeptide having an amino acidsequence encoded by a nucleotide sequence in FIGS. 1A-1B or FIG. 2; (c)a nucleotide sequence which encodes a polypeptide that is 90% or moreidentical to an amino acid sequence encoded by a nucleotide sequence inFIGS. 1A-1B or FIG. 2 or a nucleotide sequence about 90% or moreidentical to the nucleotide sequence set forth in FIGS. 1A-1B or FIG. 2;and (d) a fragment of a nucleotide sequence of (a), (b), or (c),sometimes comprising a polymorphic site associated with breast cancer;and prescribing or administering a breast cancer treatment regimen,preventative regimen and/or detection regimen to a subject from whom thesample originated where the presence of one or more polymorphicvariations associated with breast cancer are detected in the nucleotidesequence. In these methods, genetic results may be utilized incombination with other test results to diagnose breast cancer asdescribed above. Other test results include but are not limited tomamography results, imaging results, biopsy results and results fromBRCA1 or BRAC2 test results, as described above.

Detection regimens include one or more mamography procedures, a regularmamography regimen (e.g., once a year, or once every six, four, three ortwo months); an early mamography regimen (e.g., mamography tests areperformed beginning at age 25, 30, or 35); one or more biopsy procedures(e.g., a regular biopsy regimen begining at age 40); breast biopsy andbiopsy from other tissue; breast ultrasound and optinally ultrasoundanalysis of another tissue; breast magnetic resonance imaging (MRI) andoptionally MRI analysis of another tissue; electrical impedance (T-scan)analysis of breast and optionally another tissue; ductal lavage; nuclearmedicine analysis (e.g., scintimammography); BRCA1 and/or BRCA2 sequenceanalysis results; and/or thermal imaging of the breast and optionallyanother tissue.

Treatments sometimes are preventative (e.g., is prescribed oradministered to reduce the probability that a breast cancer associatedcondition arises or progresses), sometimes are therapeutic, andsometimes delay, alleviate or halt the progression of breast cancer. Anyknown preventative or therapeutic treatment for alleviating orpreventing the occurrence of breast cancer is prescribed and/oradministered. For example, certain preventative treatments often areprescribed to subjects having a predisposition to breast cancer andwhere the subject is not diagnosed with breast cancer or is diagnosed ashaving symptoms indicative of early stage breast cancer (e.g., stage I).For subjects not diagnosed as having breast cancer, any preventativetreatments known in the art can be prescribed and administered, whichinclude selective hormone receptor modulators (e.g., selective estrogenreceptor modulators (SERMs) such as tamoxifen, reloxifene, andtoremifene); compositions that prevent production of hormones (e.g.,aramotase inhibitors that prevent the production of estrogen in theadrenal gland, such as exemestane, letrozole, anastrozol, groserelin,and megestrol); other hormonal treatments (e.g., goserelin acetate andfulvestrant); biologic response modifiers such as antibodies (e.g.trastuzumab (herceptin/HER2)); surgery (e.g., lumpectomy andmastectomy); drugs that delay or halt metastasis (e.g., parnidronatedisodium); and alternative/complementary medicine (e.g., acupuncture,acupressure, moxibustion, qi gong, reiki, ayurveda, vitamins, minerals,and herbs (e.g., astragalus root, burdock root, garlic, green tea, andlicorice root)).

The use of breast cancer treatments are well known in the art, andinclude surgery, chemotherapy and/or radiation therapy. Any of thetreatments may be used in combination to treat or prevent breast cancer(e.g., surgery followed by radiation therapy or chemotherapy). Examplesof chemotherapy combinations used to treat breast cancer include:cyclophosphamide (Cytoxan), methotrexate (Amethopterin, Mexate, Folex),and fluorouracil (Fluorouracil, 5-Fu, Adrucil), which is referred to asCMF; cyclophosphamide, doxorubicin (Adriamycin), and fluorouracil, whichis referred to as CAF; and doxorubicin (Adriamycin) andcyclophosphamide, which is referred to as AC.

As breast cancer preventative and treatment information can bespecifically targeted to subjects in need thereof (e.g., those at riskof developing breast cancer or those that have early signs of breastcancer), provided herein is a method for preventing or reducing the riskof developing breast cancer in a subject, which comprises: (a) detectingthe presence or absence of a polymorphic variation associated withbreast cancer at a polymorphic site in a nucleotide sequence in anucleic acid sample from a subject; (b) identifying a subject with apredisposition to breast cancer, whereby the presence of the polymorphicvariation is indicative of a predisposition to breast cancer in thesubject; and (c) if such a predisposition is identified, providing thesubject with information about methods or products to prevent or reducebreast cancer or to delay the onset of breast cancer. Also provided is amethod of targeting information or advertising to a subpopulation of ahuman population based on the subpopulation being geneticallypredisposed to a disease or condition, which comprises: (a) detectingthe presence or absence of a polymorphic variation associated withbreast cancer at a polymorphic site in a nucleotide sequence in anucleic acid sample from a subject; (b) identifying the subpopulation ofsubjects in which the polymorphic variation is associated with breastcancer; and (c) providing information only to the subpopulation ofsubjects about a particular product which may be obtained and consumedor applied by the subject to help prevent or delay onset of the diseaseor condition.

Pharmacogenomics methods also may be used to analyze and predict aresponse to a breast cancer treatment or a drug. For example, ifpharmacogenomics analysis indicates a likelihood that an individual willrespond positively to a breast cancer treatment with a particular drug,the drug may be administered to the individual. Conversely, if theanalysis indicates that an individual is likely to respond negatively totreatment with a particular drug, an alternative course of treatment maybe prescribed. A negative response may be defined as either the absenceof an efficacious response or the presence of toxic side effects. Theresponse to a therapeutic treatment can be predicted in a backgroundstudy in which subjects in any of the following populations aregenotyped: a population that responds favorably to a treatment regimen,a population that does not respond significantly to a treatment regimen,and a population that responds adversely to a treatment regiment (e.g.,exhibits one or more side effects). These populations are provided asexamples and other populations and subpopulations may be analyzed. Basedupon the results of these analyses, a subject is genotyped to predictwhether he or she will respond favorably to a treatment regimen, notrespond significantly to a treatment regimen, or respond adversely to atreatment regimen.

The methods described herein also are applicable to clinical drugtrials. One or more polymorphic variants indicative of response to anagent for treating breast cancer or to side effects to an agent fortreating breast cancer may be identified using the methods describedherein. Thereafter, potential participants in clinical trials of such anagent may be screened to identify those individuals most likely torespond favorably to the drug and exclude those likely to experienceside effects. In that way, the effectiveness of drug treatment may bemeasured in individuals who respond positively to the drug, withoutlowering the measurement as a result of the inclusion of individuals whoare unlikely to respond positively in the study and without riskingundesirable safety problems.

Thus, another embodiment is a method of selecting an individual forinclusion in a clinical trial of a treatment or drug comprising thesteps of: (a) obtaining a nucleic acid sample from an individual; (b)determining the identity of a polymorphic variation which is associatedwith a positive response to the treatment or the drug, or at least onepolymorphic variation which is associated with a negative response tothe treatment or the drug in the nucleic acid sample, and (c) includingthe individual in the clinical trial if the nucleic acid sample containssaid polymorphic variation associated with a positive response to thetreatment or the drug or if the nucleic acid sample lacks saidpolymorphic variation associated with a negative response to thetreatment or the drug. In addition, the methods for selecting anindividual for inclusion in a clinical trial of a treatment or drugencompass methods with any further limitation described in thisdisclosure, or those following, specified alone or in any combination.The polymorphic variation may be in a sequence selected individually orin any combination from the group consisting of (i) a polynucleotidesequence set forth in FIGS. 1A-1B or FIG. 2; (ii) a polynucleotidesequence that is 90% or more identical to a nucleotide sequence setforth in FIGS. 1A-1B or FIG. 2; (iii) a polynucleotide sequence thatencodes a polypeptide having an amino acid sequence identical to or 90%or more identical to an amino acid sequence encoded by a nucleotidesequence set forth in FIGS. 1A-1B or FIG. 2; and (iv) a fragment of apolynucleotide sequence of (i), (ii), or (iii) comprising thepolymorphic site. The including step (c) optionally comprisesadministering the drug or the treatment to the individual if the nucleicacid sample contains the polymorphic variation associated with apositive response to the treatment or the drug and the nucleic acidsample lacks said biallelic marker associated with a negative responseto the treatment or the drug.

Also provided herein is a method of partnering between adiagnostic/prognostic testing provider and a provider of a consumableproduct, which comprises: (a) the diagnostic/prognostic testing providerdetects the presence or absence of a polymorphic variation associatedwith breast cancer at a polymorphic site in a nucleotide sequence in anucleic acid sample from a subject; (b) the diagnostic/prognostictesting provider identifies the subpopulation of subjects in which thepolymorphic variation is associated with breast cancer; (c) thediagnostic/prognostic testing provider forwards information to thesubpopulation of subjects about a particular product which may beobtained and consumed or applied by the subject to help prevent or delayonset of the disease or condition; and (d) the provider of a consumableproduct forwards to the diagnostic test provider a fee every time thediagnostic/prognostic test provider forwards information to the subjectas set forth in step (c) above.

Compositions Comprising Breast Cancer-Directed Molecules

Featured herein is a composition comprising a breast cancer cell and oneor more molecules specifically directed to a nucleic acid comprising anucleotide sequence in FIGS. 1A-1B or FIG. 2, or a protein, polypeptideor peptide encoded by a nucleotide sequence in FIGS. 1A-1B or FIG. 2.Such directed molecules include, but are not limited to, a compound thatbinds to a nucleic acid having a nucleotide sequence in FIGS. 1A-1B orFIG. 2, or substantially identical nucleotide sequence thereof, orpolypeptide encoded by a nucleotide sequence in FIGS. 1A-1B or FIG. 2; aRNAi or siRNA molecule having a strand complementary to a nucleotidesequence in FIGS. 1A-1B or FIG. 2; an antisense nucleic acidcomplementary to an RNA encoded by a nucleotide sequence in FIGS. 1A-1Bor FIG. 2; a ribozyme that hybridizes to a nucleotide sequence in FIGS.1A-1B or FIG. 2; a polypeptide, protein or fragment thereof encoded by anucleotide sequence in FIGS. 1A-1B or FIG. 2, or a nucleotide sequencein FIGS. 1A-1B or FIG. 2 or a substantially identical nucleotidesequence thereof; a nucleic acid aptamer that specifically binds apeptide, polypeptide, protein encoded by a nucleotide sequence in FIGS.1A-1B or FIG. 2; and an antibody that specifically binds to a peptide,polypeptide, or protein encoded by a nucleotide sequence in FIGS. 1A-1Bor FIG. 2, or binds to a nucleic acid or variant in FIGS. 1A-1B or FIG.2 associated with breast cancer.

Compositions sometimes include an adjuvant known to stimulate an immuneresponse, and in certain embodiments, an adjuvant that stimulates aT-cell lymphocyte response. Adjuvants are known, including but notlimited to an aluminium adjuvant (e.g., aluminum hydroxide); a cytokineadjuvant or adjuvant that stimulates a cytokine response (e.g.,interleukin (IL)-12 and/or γ-interferon cytokines); a Freund-typemineral oil adjuvant emulsion (e.g., Freund's complete or incompleteadjuvant); a synthetic lipoid compound; a copolymer adjuvant (e.g.,TitreMax); a saponin; Quil A; a liposome; an oil-in-water emulsion(e.g., an emulsion stabilized by Tween 80 and pluronicpolyoxyethlene/polyoxypropylene block copolymer (Syntex AdjuvantFormulation); TitreMax; detoxified endotoxin (MPL) and mycobacterialcell wall components (TDW, CWS) in 2% squalene (Ribi Adjuvant System));a muramyl dipeptide; an immune-stimulating complex (ISCOM, e.g., anAg-modified saponin/cholesterol micelle that forms stable cage-likestructure); an aqueous phase adjuvant that does not have a depot effect(e.g., Gerbu adjuvant); a carbohydrate polymer (e.g., AdjuPrime);L-tyrosine; a manide-oleate compound (e.g., Montanide); anethylene-vinyl acetate copolymer (e.g. Elvax 40W1,2); or lipid A, forexample. Such compositions are useful for generating an immune responseagainst a breast cancer directed molecule (e.g., an HLA-bindingsubsequence within a polypeptide encoded by a nucleotide sequence inFIGS. 1A-1B or FIG. 2). In such methods, a peptide having an amino acidsubsequence of a polypeptide encoded by a nucleotide sequence in FIGS.1A-1B or FIG. 2 is delivered to a subject, where the subsequence bindsto an HLA molecule and induces a CTL lymphocyte response. The peptidesometimes is delivered to the subject as an isolated peptide or as aminigene in a plasmid that encodes the peptide. Methods for identifyingHLA-binding subsequences in such polypeptides are known (see e.g.,publication WO02/20616 and PCT application US98/01373 for methods ofidentifying such sequences).

The breast cancer cell may be in a group of breast cancer cells and/orother types of cells cultured in vitro or in a tissue having breastcancer cells (e.g., a melanocytic lesion) maintained in vitro or presentin an animal in vivo (e.g., a rat, mouse, ape or human). In certainembodiments, a composition comprises a component from a breast cancercell or from a subject having a breast cancer cell instead of the breastcancer cell or in addition to the breast cancer cell, where thecomponent sometimes is a nucleic acid molecule (e.g., genomic DNA), aprotein mixture or isolated protein, for example. The aforementionedcompositions have utility in diagnostic, prognostic and pharmacogenomicmethods described previously and in breast cancer therapeutics describedhereafter. Certain breast cancer molecules are described in greaterdetail below.

Compounds

Compounds can be obtained using any of the numerous approaches incombinatorial library methods known in the art, including: biologicallibraries; peptoid libraries (libraries of molecules having thefunctionalities of peptides, but with a novel, non-peptide backbonewhich are resistant to enzymatic degradation but which neverthelessremain bioactive (see, e.g., Zuckermann et al., J. Med. Chem. 37:2678-85 (1994)); spatially addressable parallel solid phase or solutionphase libraries; synthetic library methods requiring deconvolution;“one-bead one-compound” library methods; and synthetic library methodsusing affinity chromatography selection. Biological library and peptoidlibrary approaches are typically limited to peptide libraries, while theother approaches are applicable to peptide, non-peptide oligomer orsmall molecule libraries of compounds (Lam, Anticancer Drug Des. 12:145, (1997)). Examples of methods for synthesizing molecular librariesare described, for example, in DeWitt et al., Proc. Natl. Acad. Sci.U.S.A. 90: 6909 (1993); Erb et al., Proc. Natl. Acad. Sci. USA 91: 11422(1994); Zuckermann etal., J. Med. Chem. 37: 2678 (1994); Cho etal.,Science 261: 1303 (1993); Carrell et al., Angew. Chem. Int. Ed. Engl.33: 2059. (1994); Carell et al, Angew. Chem. Int. Ed. Engl. 33: 2061(1994); and in Gallop et al., J. Med. Chem. 37: 1233. (1994).

Libraries of compounds may be presented in solution (e.g., Houghten,Biotechniques 13: 412-421 (1992)), or on beads (Lam, Nature 354: 82-84(1991)), chips (Fodor, Nature 364: 555-556 (1993)), bacteria or spores(Ladner, U.S. Pat. No. 5,223,409), plasmids (Cull et al., Proc. Natl.Acad. Sci. USA 89: 1865-1869 (1992)) or on phage (Scott and Smith,Science 249: 386-390 (1990); Devlin, Science 249: 404-406 (1990); Cwirlaet al., Proc. Natl. Acad. Sci. 87: 6378-6382 (1990); Felici, J. Mol.Biol. 222: 301-310 (1991); Ladner supra.).

A compound may alter expression or activity of a polypeptide encoded bya nucleotide sequence in FIGS. 1A-1B or FIG. 2 and may be a smallmolecule. Small molecules include, but are not limited to, peptides,peptidomimetics (e.g., peptoids), amino acids, amino acid analogs,polynucleotides, polynucleotide analogs, nucleotides, nucleotideanalogs, organic or inorganic compounds (i.e., including heteroorganicand organometallic compounds) having a molecular weight less than about10,000 grams per mole, organic or inorganic compounds having a molecularweight less than about 5,000 grams per mole, organic or inorganiccompounds having a molecular weight less than about 1,000 grams permole, organic or inorganic compounds having a molecular weight less thanabout 500 grams per mole, and salts, esters, and other pharmaceuticallyacceptable forms of such compounds.

Antisense Nucleic Acid Molecules, Ribozymes, RNAi, siRNA and ModifiedNucleic Acid Molecules

An “antisense” nucleic acid refers to a nucleotide sequencecomplementary to a “sense” nucleic acid encoding a polypeptide, e.g.,complementary to the coding strand of a double-stranded cDNA molecule orcomplementary to an mRNA sequence. The antisense nucleic acid can becomplementary to an entire coding strand in FIGS. 1A-1B or FIG. 2, or toonly a portion thereof. In another embodiment, the antisense nucleicacid molecule is antisense to a “noncoding region” of the coding strandof a nucleotide sequence in FIGS. 1A-1B or FIG. 2 (e.g., 5′ and 3′untranslated regions).

An antisense nucleic acid can be designed such that it is complementaryto the entire coding region of an mRNA encoded by a nucleotide sequencein FIGS. 1A-1B or FIG. 2, and often the antisense nucleic acid is anoligonucleotide antisense to only a portion of a coding or noncodingregion of the mRNA. For example, the antisense oligonucleotide can becomplementary to the region surrounding the translation start site ofthe mRNA, e.g., between the −10 and +10 regions of the target genenucleotide sequence of interest. An antisense oligonucleotide can be,for example, about 7, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, or more nucleotides in length. The antisense nucleic acids,which include the ribozymes described hereafter, can be designed totarget a nucleotide sequence in FIGS. 1A-1B or FIG. 2 or a variantthereof. Among the variants, minor alleles and major alleles can betargeted, and those associated with a higher risk of breast cancer areoften designed, tested, and administered to subjects.

An antisense nucleic acid can be constructed using chemical synthesisand enzymatic ligation reactions using standard procedures. For example,an antisense nucleic acid (e.g., an antisense oligonucleotide) can bechemically synthesized using naturally occurring nucleotides orvariously modified nucleotides designed to increase the biologicalstability of the molecules or to increase the physical stability of theduplex formed between the antisense and sense nucleic acids, e.g.,phosphorothioate derivatives and acridine substituted nucleotides can beused. Antisense nucleic acid also can be produced biologically using anexpression vector into which a nucleic acid has been subcloned in anantisense orientation (i.e., RNA transcribed from the inserted nucleicacid will be of an antisense orientation to a target nucleic acid ofinterest, described further in the following subsection).

When utilized as therapeutics, antisense nucleic acids typically areadministered to a subject (e.g., by direct injection at a tissue site)or generated in situ such that they hybridize with or bind to cellularmRNA and/or genomic DNA encoding a polypeptide and thereby inhibitexpression of the polypeptide, for example, by inhibiting transcriptionand/or translation. Alternatively, antisense nucleic acid molecules canbe modified to target selected cells and then are administeredsystemically. For systemic administration, antisense molecules can bemodified such that they specifically bind to receptors or antigensexpressed on a selected cell surface, for example, by linking antisensenucleic acid molecules to peptides or antibodies which bind to cellsurface receptors or antigens. Antisense nucleic acid molecules can alsobe delivered to cells using the vectors described herein. Sufficientintracellular concentrations of antisense molecules are achieved byincorporating a strong promoter, such as a pol II or pol III promoter,in the vector construct.

Antisense nucleic acid molecules sometimes are *-anomeric nucleic acidmolecules. An * anomeric nucleic acid molecule forms specificdouble-stranded hybrids with complementary RNA in which, contrary to theusual *-units, the strands run parallel to each other (Gaultier et al,Nucleic Acids. Res. 15: 6625-6641 (1987)). Antisense nucleic acidmolecules can also comprise a 2′-o-methylribonucleotide (Inoue et al,Nucleic Acids Res. 15: 6131-6148 (1987)) or a chimeric RNA-DNA analogue(Inoue et al, FEBS Lett. 215: 327-330 (1987)). Antisense nucleic acidssometimes are composed of DNA or PNA or any other nucleic acidderivatives described previously.

In another embodiment, an antisense nucleic acid is a ribozyme. Aribozyme having specificity for a nucleotide sequence in FIGS. 1A-1B orFIG. 2 can include one or more sequences complementary to suchnucleotide sequences, and a sequence having a known catalytic regionresponsible for mRNA cleavage (see e.g., U.S. Pat. No. 5,093,246 orHaselhoff and Gerlach, Nature 334: 585-591 (1988)). For example, aderivative of a Tetrahymena L-19 IVS. RNA is sometimes utilized in whichthe nucleotide sequence of the active site is complementary to thenucleotide sequence to be cleaved in a mRNA (see e.g., Cech et at U.S.Pat. No. 4,987,071; and Cech et al U.S. Pat. No. 5,116,742). Also,target mRNA sequences can be used to select a catalytic RNA having aspecific ribonuclease activity from a pool of RNA molecules (see e.g.,Bartel & Szostak, Science 261: 1411-1418 (1993)).

Breast cancer directed molecules include in certain embodiments nucleicacids that can form triple helix structures with a nucleotide sequenceof FIGS. 1A-1B or FIG. 2, especially one that includes a regulatoryregion that controls expression of a polypeptide. Gene expression can beinhibited by targeting nucleotide sequences complementary to theregulatory region of the nucleotide sequence in FIGS. 1A-1B or FIG. 2(e.g., promoter and/or enhancers) to form triple helical structures thatprevent transcription of a gene in target cells (see e.g., Helene,Anticancer Drug Des. 6(6): 569-84 (1991); Helene et al., Ann. N.Y. Acad.Sci. 660: 27-36 (1992); and Maher, Bioassays 14(12): 807-15 (1992).Potential sequences that can be targeted for triple helix formation canbe increased by creating a so-called “switchback” nucleic acid molecule.Switchback molecules are synthesized in an alternating 5′-3′,3′-5′manner, such that they base pair with first one strand of a duplex andthen the other, eliminating the necessity for a sizeable stretch ofeither purines, or pyrimidines to be present on one strand of a duplex.

Breast cancer directed molecules include RNAi and siRNA nucleic acids.Gene expression may be inhibited by the introduction of double-strandedRNA (dsRNA), which induces potent and specific gene silencing, aphenomenon called RNA interference or RNAi. See, e.g., Fire et al., U.S.Pat. No. 6,506,559; Tuschl et al. PCT International Publication No. WO01/75164; Kay et al. PCT International Publication No. WO 03/010180A1;or Bosher J M, Labouesse, Nat Cell Biol 2000. February;2(2):E31-6. Thisprocess has been improved by decreasing the size of the double-strandedRNA to 20-24 base pairs (to create small-interfering RNAs or siRNAs)that “switched off” genes in mammalian cells without initiating an acutephase response, i.e., a host defense mechanism that often results incell death (see, e.g., Caplen et al. Proc Natl Acad Sci USA. 2001 Aug.14;98(17):9742-7 and Elbashir et al. Methods 2002February;26(2):199-213). There is increasing evidence ofpost-transcriptional gene silencing by RNA interference. (RNAi) forinhibiting targeted expression in mammalian cells at the mRNA level, inhuman cells. There is additional evidence of effective methods forinhibiting the proliferation and migration of tumor cells in humanpatients, and for inhibiting metastatic cancer development (see, e.g.,U.S. Patent Application No. US2001000993183; Caplen et al. Proc NatlAcad Sci USA; and Abderrahmani et al. Mol Cell Biol 2001 Nov. 21(21):7256-67).

An “siRNA” or “RNAi” refers to a nucleic acid that forms a doublestranded RNA and has the ability to reduce or inhibit expression of agene or target gene when the siRNA is delivered to or expressed in thesame cell as the gene or target gene. “siRNA” refers to shortdouble-stranded RNA formed by the complementary strands. Complementaryportions of the siRNA that hybridize to form the double strandedmolecule often have substantial or complete identity to the targetmolecule sequence. In one embodiment, an siRNA refers to a nucleic acidthat has substantial or complete identity to a target gene and forms adouble stranded siRNA, such as a nucleotide sequence in FIGS. 1A-1B orFIG. 2.

When designing the siRNA molecules, the targeted region often isselected from a given DNA sequence beginning 50 to 100 nucleotidesdownstream of the start codon. See, e.g., Elbashir et al, Methods26:199-213 (2002). Initially, 5′ or 3′ UTRs and regions nearby the startcodon were avoided assuming that UTR-binding proteins and/or translationinitiation complexes may interfere with binding of the siRNP or RISCendonuclease complex. Sometimes regions of the target 23 nucleotides inlength conforming to the sequence motif AA(N19)TT (N, an nucleotide),and regions with approximately 30% to 70% G/C-content (often about 50%G/C-content) often are selected. If no suitable sequences are found, thesearch often is extended using the motif NA(N21). The sequence of thesense siRNA sometimes corresponds to (N19) TT or N21 (position 3 to 23of the 23-nt motif), respectively. In the latter case, the 3′ end of thesense siRNA often is converted to TT. The rationale for this sequenceconversion is to generate a symmetric duplex with respect to thesequence composition of the sense and antisense 3′ overhangs. Theantisense siRNA is synthesized as the complement to position 1 to 21 ofthe 23-nt motif. Because position 1 of the 23-nt motif is not recognizedsequence-specifically by the antisense siRNA, the 3′-most nucleotideresidue of the antisense siRNA can be chosen deliberately. However, thepenultimate nucleotide of the antisense siRNA (complementary to position2 of the 23-nt motif) often is complementary to the targeted sequence.For simplifying chemical synthesis, TT often is utilized. siRNAscorresponding to the target motif NAR(N17)YNN, where R is purine (A,G)and Y is pyrimidine (C,U), often are selected. Respective 21 nucleotidesense and antisense siRNAs often begin with a purine nucleotide and canalso be expressed from pol III expression vectors without a change intargeting site. Expression of RNAs from pol III promoters often isefficient when the first transcribed nucleotide is a purine.

The sequence of the siRNA can correspond to the full length target gene,or a subsequence thereof. Often, the siRNA is about 15 to about 50nucleotides in length (e.g., each complementary sequence of the doublestranded siRNA is 15-50 nucleotides in length, and the double strandedsiRNA is about 15-50 base pairs in length, somtimes about 20-30nucleotides in length or about 20-25 nucleotides in length, e.g., 20,21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length. ThesiRNA sometimes is about 21 nucleotides in length. Methods of usingsiRNA are well known in the art, and specific siRNA molecules may bepurchased from a number of companies including Dharmacon Research, Inc.

Antisense, ribozyme, RNAi and siRNA nucleic acids can be altered to formmodified nucleic acid molecules. The nucleic acids can be altered atbase moieties, sugar moieties or phosphate backbone moieties to improvestability, hybridization, or solubility of the molecule. For example,the deoxyribose phosphate backbone of nucleic acid molecules can bemodified to generate peptide nucleic acids (see Hyrup et al.,Bioorganic. & Medicinal Chemistry 4 (1): 5-23 (1996)). As used herein,the terms “peptide nucleic acid” or “PNA” refers to a nucleic acid mimicsuch as a DNA mimic, in which the deoxyribose phosphate backbone isreplaced by a pseudopeptide backbone and only the four naturalnucleobases are retained. The neutral backbone of a PNA can allow forspecific hybridization to DNA and RNA under conditions of low ionicstrength. Synthesis of PNA oligomers can be performed using standardsolid phase peptide synthesis protocols as described, for example, inHyrup et al., (1996) supra and Perry-O'Keefe et al., Proc. Natl. Acad.Sci. 93: 14670-675 (1996).

PNA nucleic acids can be used in prognostic, diagnostic, and therapeuticapplications. For example, PNAs can be used as antisense or antigeneagents for sequence-specific modulation of gene expression by, forexample, inducing transcription or translation arrest or inhibitingreplication. PNA nucleic acid molecules can also be used in the analysisof single base pair mutations in a gene, (e.g., by PNA-directed PCRclamping); as “artificial restriction enzymes” when used in combinationwith other enzymes, (e.g., S1 nucleases (Hyrup (1996) supra)); or asprobes or primers for DNA sequencing or hybridization (Hyrup et al.,(1996) supra; Perry-O'Keefe supra).

In other embodiments, oligonucleotides may include other appended groupssuch as peptides (e.g., for targeting host cell receptors in vivo), oragents facilitating transport across cell membranes (see e.g., Letsingeret al., Proc. Natl. Acad. Sci. USA 86: 6553-6556 (1989); Lemaitre etal., Proc. Natl. Acad. Sci. USA 84: 648-652 (1987); PCT Publication No.WO88/09810) or the blood-brain barrier (see, e.g., PCT Publication No.WO89/10134). In addition, oligonucleotides can be modified withhybridization-triggered cleavage agents (See, e.g., Krol et al.,Bio-Techniques 6: 958-976 (1988)) or intercalating agents. (See, e.g.,Zon, Pharm. Res. 5: 539-549 (1988)). To this end, the oligonucleotidemay be conjugated to another molecule, (e.g., a peptide, hybridizationtriggered cross-linking agent, transport agent, orhybridization-triggered cleavage agent).

Also included herein are molecular beacon oligonucleotide primer andprobe molecules having one or more regions complementary to a nucleotidesequence of FIGS. 1A-1B or FIG. 2, two complementary regions one havinga fluorophore and one a quencher such that the molecular beacon isuseful for quantifying the presence of the nucleic acid in a sample.Molecular beacon nucleic acids are described, for example, in Lizardi etal., U.S. Pat. No. 5,854,033; Nazarenko et al., U.S. Pat. No. 5,866,336,and Livak et al., U.S. Pat. No. 5,876,930.

Antibodies

The term “antibody” as used herein refers to an immunoglobulin moleculeor immunologically active portion thereof, i.e., an antigen-bindingportion. Examples of immunologically active portions of immunoglobulinmolecules include F(ab) and F(ab′)2 fragments which can be generated bytreating the antibody with an enzyme such as pepsin. An antibodysometimes is a polyclonal, monoclonal, recombinant (e.g., a chimeric orhumanized), fully human, non-human (e.g., murine), or a single chainantibody. An antibody may have effector function and can fix complement,and is sometimes coupled to a toxin or imaging agent.

A full-length polypeptide or antigenic peptide fragment encoded by anucleotide sequence in FIGS. 1A-1B or FIG. 2 can be used as an immunogenor can be used to identify antibodies made with other immunogens, e.g.,cells, membrane preparations, and the like. An antigenic peptide oftenincludes at least 8 amino acid residues of the amino acid sequencesencoded by a nucleotide sequence of FIGS. 1A-1B or FIG. 2 andencompasses an epitope. Antigenic peptides sometimes include 10 or moreamino acids, 15 or more amino acids, 20 or more amino acids, or 30 ormore amino acids. Hydrophilic and hydrophobic fragments of polypeptidessometimes are used as immunogens.

Epitopes encompassed by the antigenic peptide are regions located on thesurface of the polypeptide (e.g., hydrophilic regions) as well asregions with high antigenicity. For example, an Emini surfaceprobability analysis of the human polypeptide sequence can be used toindicate the regions that have a particularly high probability of beinglocalized to the surface of the polypeptide and are thus likely toconstitute surface residues useful for targeting antibody production.The antibody may bind an epitope on any domain or region on polypeptidesdescribed herein.

Also, chimeric, humanized, and completely human antibodies are usefulfor applications which include repeated administration to subjects.Chimeric and humanized monoclonal antibodies, comprising both human andnon-human portions, can be made using standard recombinant DNAtechniques. Such chimeric and humanized monoclonal antibodies can beproduced by recombinant DNA techniques known in the art, for exampleusing methods described in Robinson et al International Application No.PCT/US86/02269; Akira, et al European Patent Application 184,187;Taniguchi, M., European Patent Application 171,496; Morrison et alEuropean Patent Application 173,494; Neuberger et al PCT InternationalPublication No. WO 86/01533; Cabilly et al U.S. Pat. No. 4,816,567;Cabilly et al European Patent Application 125,023; Better et al.,Science 240: 1041-1043 (1988); Liu et al., Proc. Natl. Acad. Sci. USA84: 3439-3443 (1987); Liu et al., J. Immunol. 139: 3521-3526 (1987); Sunet al., Proc. Natl. Acad. Sci. USA 84: 214-218 (1987); Nishimura et al.,Canc. Res. 47: 999-1005 (1987); Wood et al., Nature 314: 446-449.(1985); and Shaw et al., J. Natl. Cancer Inst. 80: 1553-1559 (1988);Morrison, S. L., Science 229: 1202-1207 (1985); Oi et al., BioTechniques4: 214 (1986); Winter U.S. Pat. No. 5,225,539; Jones et al., Nature 321:552-525. (1986); Verhoeyan et al., Science 239: 1534; and Beidler etal., J. Immunol. 141:4053-4060 (1988).

Completely human antibodies are particularly desirable for therapeutictreatment of human patients. Such antibodies can be produced usingtransgenic mice that are incapable of expressing endogenousimmunoglobulin heavy and light chains genes, but which can express humanheavy and light chain genes. See, for example, Lonberg and Huszar, Int.Rev. Immunol. 13: 65-93. (1995); and U.S. Pat. Nos. 5,625,126;5,633,425; 5,569,825; 5,661,016; and 5,545,806. In addition, companiessuch as Abgenix, Inc. (Fremont, Calif.) and Medarex, Inc. (Princeton,N.J.), can be engaged to provide human antibodies directed against aselected antigen using technology similar to that described above.Completely human antibodies that recognize a selected epitope also canbe generated using a technique referred to as “guided selection.” Inthis approach a selected non-human monoclonal antibody (e.g., a murineantibody) is used to guide the selection of a completely human antibodyrecognizing the same epitope. This technology is described for exampleby Jespers et al., Bio/Technology 12: 899-903 (1994).

Antibody can be a single chain antibody. A single chain antibody (scFV)can be engineered (see, e.g., Colcher et al., Ann. N Y Acad. Sci. 880:263-80 (1999); and Reiter, Clin. Cancer Res. 2: 245-52 (1996)). Singlechain antibodies can be dimerized or multimerized to generatemultivalent antibodies having specificities for different epitopes ofthe same target polypeptide.

Antibodies also may be selected or modified so that they exhibit reducedor no ability to bind an Fc receptor. For example, an antibody may be anisotype or subtype, fragment or other mutant, which does not supportbinding to an Fc receptor (e.g., it has a mutagenized or deleted Fcreceptor binding region).

Also, an antibody (or fragment thereof) may be conjugated to atherapeutic moiety such as a cytotoxin, a therapeutic agent or aradioactive metal ion. A cytotoxin or cytotoxic agent includes any agentthat is detrimental to cells. Examples include taxol, cytochalasin B,gramicidin D, ethidium bromide, emetine, mitomycin, etoposide,tenoposide, vincristine, vinblastine, colchicin, doxorubicin,daunorubicin, dihydroxy anthracin dione, mitoxantrone, mithramycin,actinomycin D, 1 dehydrotestosterone, glucocorticoids, procaine,tetracaine, lidocaine, propranolol, and puromycin and analogs orhomologs thereof. Therapeutic agents include, but are not limited to,antimetabolites (e.g., methotrexate, 6-mercaptopurine, 6-thioguanine,cytarabine, 5-fluorouracil decarbazine), alkylating agents (e.g.,mechlorethamine, thiotepa chlorambucil, melphalan, carmustine (BCNU) andlomustine (CCNU), cyclophosphamide, busulfan, dibromomannitol,streptozotocin, mitomycin C, and cis-dichlorodiamine platinum (II) (DDP)cisplatin), anthracyclines (e.g., daunorubicin (formerly daunomycin) anddoxorubicin), antibiotics (e.g., dactinomycin (formerly actinomycin),bleomycin, mithramycin, and anthramycin (AMC)), and anti-mitotic agents(e.g., vincristine and vinblastine).

Antibody conjugates can be used for modifying a given biologicalresponse. For example, the drug moiety may be a protein or polypeptidepossessing a desired biological activity. Such proteins may include, forexample, a toxin such as abrin, ricin A, pseudomonas exotoxin, ordiphtheria toxin; a polypeptide such as tumor necrosis factor,γ-interferon, α-interferon, nerve growth factor, platelet derived growthfactor, tissue plasminogen activator; or, biological response modifierssuch as, for example, lymphokines, interleukin-1 (“IL-1”), interleukin-2(“IL-2”), interleukin-6 (“IL-6”), granulocyte macrophage colonystimulating factor (“GM-CSF”), granulocyte colony stimulating factor(“G-CSF”), or other growth factors. Also, an antibody can be conjugatedto a second antibody to form an antibody heteroconjugate as described bySegal in U.S. Pat. No. 4,676,980, for example.

An antibody (e.g., monoclonal antibody) can be used to isolate targetpolypeptides by standard techniques, such as affinity chromatography orimmunoprecipitation. Moreover, an antibody can be used to detect atarget polypeptide (e.g., in a cellular lysate or cell supernatant) inorder to evaluate the abundance and pattern of expression of thepolypeptide. Antibodies can be used diagnostically to monitorpolypeptide levels in tissue as part of a clinical testing procedure,e.g., to determine the efficacy of a given treatment regimen. Detectioncan be facilitated by coupling (i.e., physically linking) the antibodyto a detectable substance (i.e., antibody labeling). Examples ofdetectable substances include various enzymes, prosthetic groups,fluorescent materials, luminescent materials, bioluminescent materials,and radioactive materials. Examples of suitable enzymes includehorseradish peroxidase, alkaline phosphatase, β-galactosidase, oracetylcholinesterase; examples of suitable prosthetic group complexesinclude streptavidin/biotin and avidin/biotin; examples of suitablefluorescent materials include umbelliferone, fluorescein, fluoresceinisothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansylchloride or phycoerythrin; an example of a luminescent material includesluminol; examples of bioluminescent materials include luciferase,luciferin, and aequorin, and examples of suitable radioactive materialinclude ¹²⁵I, ¹³¹I, ³⁵S or ³H. Also, an antibody can be utilized as atest molecule for determining whether it can treat breast cancer, and asa therapeutic for administration to a subject for treating breastcancer.

An antibody can be made by immunizing with a purified antigen, or afragment thereof, e.g., a fragment described herein, a membraneassociated antigen, tissues, e.g., crude tissue preparations, wholecells, preferably living cells, lysed cells, or cell fractions.

Included herein are antibodies which bind only a native polypeptide,only denatured or otherwise non-native polypeptide, or which bind both,as well as those having linear or conformational epitopes.Conformational epitopes sometimes can be identified by selectingantibodies that bind to native but not denatured polypeptide. Alsofeatured are antibodies that specifically bind to a polypeptide variantassociated with breast cancer.

Screening Assays

Featured herein are methods for identifying a candidate therapeutic fortreating breast cancer. The methods comprise contacting a test moleculewith a target molecule in a system. A “target molecule” as used hereinrefers to a nucleic acid of FIGS. 1A-1B or FIG. 2, a substantiallyidentical nucleic acid thereof, or a fragment thereof, and an encodedpolypeptide of the foregoing. The method also comprises determining thepresence or absence of an interaction between the test molecule and thetarget molecule, where the presence of an interaction between the testmolecule and the nucleic acid or polypeptide identifies the testmolecule as a candidate breast cancer therapeutic. The interactionbetween the test molecule and the target molecule may be quantified.

Test molecules and candidate therapeutics include, but are not limitedto, compounds, antisense nucleic acids, siRNA molecules, ribozymes,polypeptides or proteins encoded by nucleic acids in FIGS. 1A-1B or FIG.2, or fragments thereof, and immunotherapeutics (e.g., antibodies andHLA-presented polypeptide fragments). A test molecule or candidatetherapeutic may act as a modulator of target molecule concentration ortarget molecule function in a system. A “modulator” may agonize (i.e.,up-regulates) or antagonize (i.e., down-regulates) a target moleculeconcentration partially or completely in a system by affecting suchcellular functions as DNA replication and/or DNA processing (e.g. DNAmethylation or DNA repair), RNA transcription and/or RNA processing(e.g., removal of intronic sequences and/or translocation of splicedmRNA from the nucleus), polypeptide production (e.g., translation of thepolypeptide from mRNA), and/or polypeptide post-translationalmodification (e.g., glycosylation, phosphorylation, and proteolysis ofpro-polypeptides). A modulator may also agonize or antagonize abiological function of a target molecule partially or completely, wherethe function may include adopting a certain structural conformation,interacting with one or more binding partners, ligand binding, catalysis(e.g., phosphorylation, dephosphorylation, hydrolysis, methylation, andisomerization), and an effect upon a cellular event (e.g., effectingprogression of breast cancer).

As used herein, the term “system” refers to a cell free in vitroenvironment and a cell-based environment such as a collection of cells,a tissue, an organ, or an organism. A system is “contacted” with a testmolecule in a variety of manners, including adding molecules in solutionand allowing them to interact with one another by diffusion, cellinjection, and any administration routes in an animal. As used herein,the term “interaction” refers to an effect of a test molecule on testmolecule, where the effect sometimes is binding between the testmolecule and the target molecule, and sometimes is an observable changein cells, tissue, or organism.

There are many standard methods for detecting the presence or absence ofinteraction between a test molecule and a target molecule. For example,titrametric, acidimetric, radiometric, NMR, monolayer, polarographic,spectrophotometric, fluorescent, and ESR assays probative of a targetmolecule interaction may be utilized.

Test molecule/target molecule interactions can be detected and/orquantified using assays known in the art. For example, an interactioncan be determined by labeling the test molecule and/or the targetmolecule, where the label is covalently or non-covalently attached tothe test molecule or target molecule. The label is sometimes aradioactive molecule such as ¹²⁵I, ¹³¹I, ³⁵S or ³H, which can bedetected by direct counting of radioemission or by scintillationcounting. Also, enzymatic labels such as horseradish peroxidase,alkaline phosphatase, or luciferase may be utilized where the enzymaticlabel can be detected by determining conversion of an appropriatesubstrate to product. In addition, presence or absence of an interactioncan be determined without labeling. For example, a microphysiometer(e.g., Cytosensor) is an analytical instrument that measures the rate atwhich a cell acidifies its environment using a light-addressablepotentiometric sensor (LAPS). Changes in this acidification rate can beused as an indication of an interaction between a test molecule andtarget molecule (McConnell, H. M. et al., Science 257: 1906-1912(1992)).

In cell-based systems, cells typically include a nucleic acid from FIGS.1A-1B or FIG. 2, an encoded polypeptide, or substantially identicalnucleic acid or polypeptide thereof, and are often of mammalian origin,although the cell can be of any origin. Whole cells, cell homogenates,and cell fractions (e.g., cell membrane fractions) can be subjected toanalysis. Where interactions between a test molecule with a targetpolypeptide are monitored, soluble and/or membrane bound forms of thepolypeptide may be utilized. Where membrane-bound forms of thepolypeptide are used, it may be desirable to utilize a solubilizingagent. Examples of such solubilizing agents include non-ionic detergentssuch as n-octylglucoside, n-dodecylglucoside, n-dodecylmaltoside,octanoyl-N-methylglucamide, decanoyl-N-methylglucamide, Triton® X-100,Triton® X-114, Thesit®, Isotridecypoly(ethylene glycol ether)_(n),3-[(3-cholamidopropyl)dimethylamminio]-1-propane sulfonate (CHAPS),3-[(3-cholamidopropyl)dimethylamminio]-2-hydroxy-1-propane sulfonate(CHAPSO), or N-dodecyl-N,N-dimethyl-3-ammonio-1-propane sulfonate.

An interaction between a test molecule and target molecule also can bedetected by monitoring fluorescence energy transfer (FET) (see, e.g.,Lakowicz et al., U.S. Pat. No. 5,631,169; Stavrianopoulos et al. U.S.Pat. No. 4,868,103). A fluorophore label on a first, “donor” molecule isselected such that its emitted fluorescent energy will be absorbed by afluorescent label on a second, “acceptor” molecule, which in turn isable to fluoresce due to the absorbed energy. Alternately, the “donor”polypeptide molecule may simply utilize the natural fluorescent energyof tryptophan residues. Labels are chosen that emit differentwavelengths of light, such that the “acceptor” molecule label may bedifferentiated from that of the “donor”. Since the efficiency of energytransfer between the labels is related to the distance separating themolecules, the spatial relationship between the molecules can beassessed. In a situation in which binding occurs between the molecules,the fluorescent emission of the “acceptor” molecule label in the assayshould be maximal. An FET binding event can be conveniently measuredthrough standard fluorometric detection means well known in the art(e.g., using a fluorimeter).

In another embodiment, determining the presence or absence of aninteraction between a test molecule and a target molecule can beeffected by monitoring surface plasmon resonance (see, e.g., Sjolander &Urbaniczk, Anal. Chem. 63: 2338-2345. (1991) and Szabo et al., Curr.Opin. Struct. Biol. 5: 699-705 (1995)). “Surface plasmon resonance” or“biomolecular interaction analysis (BIA)” can be utilized to detectbiospecific interactions in real time, without labeling any of theinteractants (e.g., BIAcore). Changes in the mass at the binding surface(indicative of a binding event) result in alterations of the refractiveindex of light near the surface (the optical phenomenon of surfaceplasmon resonance (SPR)), resulting in a detectable signal which can beused as an indication of real-time reactions between biologicalmolecules.

In another embodiment, the target molecule or test molecules areanchored to a solid phase, facilitating the detection of targetmolecule/test molecule complexes and separation of the complexes fromfree, uncomplexed molecules. The target molecule or test molecule isimmobilized to the solid support. In an embodiment, the target moleculeis anchored to a solid surface, and the test molecule, which is notanchored, can be labeled, either directly or indirectly, with detectablelabels discussed herein.

It sometimes is desirable to immobilize a target molecule, ananti-target molecule antibody, and/or test molecules to facilitateseparation of target molecule/test molecule complexes from uncomplexedforms, as well as to accommodate automation of the assay. The attachmentbetween a test molecule and/or target molecule and the solid support maybe covalent or non-covalent (see, e.g., U.S. Pat. No. 6,022,688 fornon-covalent attachments). The solid support may be one or more surfacesof the system, such as one or more surfaces in each well of a microtitreplate, a surface of a silicon wafer, a surface of a bead (see, e.g.,Lam, Nature 354: 82-84 (1991)) that is optionally linked to anothersolid support, or a channel in a microfluidic device, for example. Typesof solid supports, linker molecules for covalent and non-covalentattachments to solid supports, and methods for immobilizing nucleicacids and other molecules to solid supports are well known (see, e.g.,U.S. Pat. Nos. 6,261,776; 5,900,481; 6,133,436; and 6,022,688; and WIPOpublication WO 01/18234).

In an embodiment, target molecule may be immobilized to surfaces viabiotin and streptavidin. For example, biotinylated target polypeptidecan be prepared from biotin-NHS(N-hydroxy-succinimide) using techniquesknown in the art (e.g., biotinylation kit, Pierce Chemicals, Rockford,Ill.), and immobilized in the wells of streptavidin-coated 96 wellplates (Pierce Chemical). In another embodiment, a target polypeptidecan be prepared as a fusion polypeptide. For example,glutathione-S-transferase/target polypeptide fusion can be adsorbed ontoglutathione sepharose beads (Sigma Chemical, St. Louis, Mo.) orglutathione derivatized microtitre plates, which are then combined witha test molecule under conditions conducive to complex formation (e.g.,at physiological conditions for salt and pH). Following incubation, thebeads or microtitre plate wells are washed to remove any unboundcomponents, or the matrix is immobilized in the case of beads, andcomplex formation is determined directly or indirectly as describedabove. Alternatively, the complexes can be dissociated from the matrix,and the level of target molecule binding or activity is determined usingstandard techniques.

In an embodiment, the non-immobilized component is added to the coatedsurface containing the anchored component. After the reaction iscomplete, unreacted components are removed (e.g. by washing) underconditions such that a significant percentage of complexes formed willremain immobilized to the solid surface. The detection of complexesanchored on the solid surface can be accomplished in a number ofmanners. Where the previously non-immobilized component is pre-labeled,the detection of label immobilized on the surface indicates thatcomplexes were formed. Where the previously non-immobilized component isnot pre-labeled, an indirect label can be used to detect complexesanchored on the surface, e.g., by adding a labeled antibody specific forthe immobilized component, where the antibody, in turn, can be directlylabeled or indirectly labeled with, e.g., a labeled anti-Ig antibody.

In another embodiment, an assay is performed utilizing antibodies thatspecifically bind target molecule or test molecule but do not interferewith binding of the target molecule to the test molecule. Suchantibodies can be derivatized to a solid support, and unbound targetmolecule may be immobilized by antibody conjugation. Methods fordetecting such complexes, in addition to those described above for theGST-immobilized complexes, include immunodetection of complexes usingantibodies reactive with the target molecule, as well as enzyme-linkedassays which rely on detecting an enzymatic activity associated with thetarget molecule.

Cell free assays also can be conducted in a liquid phase. In such anassay, reaction products are separated from unreacted components, by anyof a number of standard techniques, including but not limited to:differential centrifugation (see, e.g., Rivas, G., and Minton, TrendsBiochem Sci August;18(8): 284-7 (1993)); chromatography (gel filtrationchromatography, ion-exchange chromatography); electrophoresis (see,e.g., Ausubel et al., eds. Current Protocols in Molecular Biology, J.Wiley: New York (1999)); and immunoprecipitation (see, e.g., Ausubel etal., eds., supra). Media and chromatographic techniques are known to oneskilled in the art (see, e.g. Heegaard, J. Mol. Recognit. Winter;11(1-6): 141-8 (1998); Hage & Tweed, J. Chromatogr. B Biomed. Sci. Appl.October 10; 699 (1-2): 499-525 (1997)). Further, fluorescence energytransfer may also be conveniently utilized, as described herein, todetect binding without further purification of the complex fromsolution.

In another embodiment, modulators of target molecule expression areidentified. For example, a cell or cell free mixture is contacted with acandidate compound and the expression of target mRNA or targetpolypeptide is evaluated relative to the level of expression of targetmRNA or target polypeptide in the absence of the candidate compound.When expression of target mRNA or target polypeptide is greater in thepresence of the candidate compound than in its absence, the candidatecompound is identified as an agonist of target mRNA or targetpolypeptide expression. Alternatively, when expression of target mRNA ortarget polypeptide is less (e.g., less with statistical significance) inthe presence of the candidate compound than in its absence, thecandidate compound is identified as an antagonist or inhibitor of targetmRNA or target polypeptide expression. The level of target mRNA ortarget polypeptide expression can be determined by methods describedherein.

In another embodiment, binding partners that interact with a targetmolecule are detected. The target molecules can interact with one ormore cellular or extracellular macromolecules, such as polypeptides invivo, and these interacting molecules are referred to herein as “bindingpartners.” Binding partners can agonize or antagonize target moleculebiological activity. Also, test molecules that agonize or antagonizeinteractions between target molecules and binding partners can be usefulas therapeutic molecules as they can up-regulate or down-regulatedtarget molecule activity in vivo and thereby treat breast cancer.

Binding partners of target molecules can be identified by methods knownin the art. For example, binding partners may be identified by lysingcells and analyzing cell lysates by electrophoretic techniques.Alternatively, a two-hybrid assay or three-hybrid assay can be utilized(see, e.g., U.S. Pat. No. No. 5,283,317; Zervos et al., Cell 72:223-232(1993); Madura et al., J. Biol. Chem. 268: 12046-12054 (1993); Bartel etal., Biotechniques 14: 920-924 (1993); Iwabuchi et al., Oncogene 8:1693-1696 (1993); and Brent WO94/10300). A two-hybrid system is based onthe modular nature of most transcription factors, which consist ofseparable DNA-binding and activation domains. The assay often utilizestwo different DNA constructs. In one construct, a nucleic acid fromFIGS. 1A-1B or FIG. 2 (sometimes referred to as the “bait”) is fused toa gene encoding the DNA binding domain of a known transcription factor(e.g., GAL-4). In another construct, a DNA sequence from a library ofDNA sequences that encodes a potential binding partner (sometimesreferred to as the “prey”) is fused to a gene that encodes an activationdomain of the known transcription factor. Sometimes, a nucleic acid fromFIGS. 1A-1B or FIG. 2 can be fused to the activation domain. If the“bait” and the “prey” molecules interact in vivo, the DNA-binding andactivation domains of the transcription factor are brought into closeproximity. This proximity allows transcription of a reporter gene (e.g.,LacZ) which is operably linked to a transcriptional regulatory siteresponsive to the transcription factor. Expression of the reporter genecan be detected and cell colonies containing the functionaltranscription factor can be isolated and used to identify the potentialbinding partner.

In an embodiment for identifying test molecules that antagonize oragonize complex formation between target molecules and binding partners,a reaction mixture containing the target molecule and the bindingpartner is prepared, under conditions and for a time sufficient to allowcomplex formation. The reaction mixture often is provided in thepresence or absence of the test molecule. The test molecule can beincluded initially in the reaction mixture, or can be added at a timesubsequent to the addition of the target molecule and its bindingpartner. Control reaction mixtures are incubated without the testmolecule or with a placebo. Formation of any complexes between thetarget molecule and the binding partner then is detected. Decreasedformation of a complex in the reaction mixture containing test moleculeas compared to in a control reaction mixture indicates that the moleculeantagonizes target molecule/binding partner complex formation.Alternatively, increased formation of a complex in the reaction mixturecontaining test molecule as compared to in a control reaction mixtureindicates that the molecule agonizes target molecule/binding partnercomplex formation. In another embodiment, complex formation of targetmolecule/binding partner can be compared to complex formation of mutanttarget molecule/binding partner (e.g., amino acid modifications in atarget polypeptide). Such a comparison can be important in those caseswhere it is desirable to identify test molecules that modulateinteractions of mutant but not non-mutated target gene products.

The assays can be conducted in a heterogeneous or homogeneous format. Inheterogeneous assays, target molecule and/or the binding partner areimmobilized to a solid phase, and complexes are detected on the solidphase at the end of the reaction. In homogeneous assays, the entirereaction is carried out in a liquid phase. In either approach, the orderof addition of reactants can be varied to obtain different informationabout the molecules being tested. For example, test compounds thatagonize target molecule/binding partner interactions can be identifiedby conducting the reaction in the presence of the test molecule in acompetition format. Alternatively, test molecules that agonize preformedcomplexes, e.g., molecules with higher binding constants that displaceone of the components from the complex, can be tested by adding the testcompound to the reaction mixture after complexes have been formed.

In a heterogeneous assay embodiment, the target molecule or the bindingpartner is anchored onto a solid surface (e.g., a microtitre plate),while the non-anchored species is labeled, either directly orindirectly. The anchored molecule can be immobilized by non-covalent orcovalent attachments. Alternatively, an immobilized antibody specificfor the molecule to be anchored can be used to anchor the molecule tothe solid surface. The partner of the immobilized species is exposed tothe coated surface with or without the test molecule. After the reactionis complete, unreacted components are removed (e.g., by washing) suchthat a significant portion of any complexes formed will remainimmobilized on the solid surface. Where the non-immobilized species ispre-labeled, the detection of label immobilized on the surface isindicative of complex. Where the non-immobilized species is notpre-labeled, an indirect label can be used to detect complexes anchoredto the surface; e.g., by using a labeled antibody specific for theinitially non-immobilized species. Depending upon the order of additionof reaction components, test compounds that inhibit complex formation orthat disrupt preformed complexes can be detected.

In another embodiment, the reaction can be conducted in a liquid phasein the presence or absence of test molecule, where the reaction productsare separated from unreacted components, and the complexes are detected(e.g., using an immobilized antibody specific for one of the bindingcomponents to anchor any complexes formed in solution, and a labeledantibody specific for the other partner to detect anchored complexes).Again, depending upon the order of addition of reactants to the liquidphase, test compounds that inhibit complex or that disrupt preformedcomplexes can be identified.

In an alternate embodiment, a homogeneous assay can be utilized. Forexample, a preformed complex of the target gene product and theinteractive cellular or extracellular binding partner product isprepared. One or both of the target molecule or binding partner islabeled, and the signal generated by the label(s) is quenched uponcomplex formation (, e.g., U.S. Pat. No. 4,109,496 that utilizes thisapproach for immunoassays). Addition of a test molecule that competeswith and displaces one of the species from the preformed complex willresult in the generation of a signal above background. In this way, testsubstances that disrupt target molecule/binding partner complexes can beidentified.

Candidate therapeutics for treating breast cancer are identified from agroup of test molecules that interact with a target molecule. Testmolecules are normally ranked according to the degree with which theymodulate (e.g., agonize or antagonize) a function associated with thetarget molecule (e.g., DNA replication and/or processing, RNAtranscription and/or processing, polypeptide production and/orprocessing, and/or biological function/activity), and then top rankingmodulators are selected. Also, pharmacogenomic information describedherein can determine the rank of a modulator. The top 10% of ranked testmolecules often are selected for further testing as candidatetherapeutics, and sometimes the top 15%, 20%, or 25% of ranked testmolecules are selected for further testing as candidate therapeutics.Candidate therapeutics typically are formulated for administration to asubject.

Therapeutic Formulations

Formulations and pharmaceutical compositions typically include incombination with a pharmaceutically acceptable carrier one or moretarget molecule modulators. The modulator often is a test moleculeidentified as having an interaction with a target molecule by ascreening method described above. The modulator may be a compound, anantisense nucleic acid, a ribozyme, an antibody, or a binding partner.Also, formulations may comprise a target polypeptide or fragment thereofin combination with a pharmaceutically acceptable carrier.

As used herein, the term “pharmaceutically acceptable carrier” includessolvents, dispersion media, coatings, antibacterial and antifungalagents, isotonic and absorption delaying agents, and the like,compatible with pharmaceutical administration. Supplementary activecompounds can also be incorporated into the compositions. Pharmaceuticalcompositions can be included in a container, pack, or dispenser togetherwith instructions for administration.

A pharmaceutical composition typically is formulated to be compatiblewith its intended route of administration. Examples of routes ofadministration include parenteral, e.g., intravenous, intradermal,subcutaneous, oral (e.g., inhalation), transdermal (topical),transmucosal, and rectal administration. Solutions or suspensions usedfor parenteral, intradermal, or subcutaneous application can include thefollowing components: a sterile diluent such as water for injection,saline solution, fixed oils, polyethylene glycols, glycerin, propyleneglycol or other synthetic solvents; antibacterial agents such as benzylalcohol or methyl parabens; antioxidants such as ascorbic acid or sodiumbisulfite; chelating agents such as ethylenediaminetetraacetic acid;buffers such as acetates, citrates or phosphates and agents for theadjustment of tonicity such as sodium chloride or dextrose pH can beadjusted with acids or bases, such as hydrochloric acid or sodiumhydroxide. The parenteral preparation can be enclosed in ampoules,disposable syringes or multiple dose vials made of glass or plastic.

Oral compositions generally include an inert diluent or an ediblecarrier. For the purpose of oral therapeutic administration, the activecompound can be incorporated with excipients and used in the form oftablets, troches, or capsules, e.g., gelatin capsules. Oral compositionscan also be prepared using a fluid carrier for use as a mouthwash.Pharmaceutically compatible binding agents, and/or adjuvant materialscan be included as part of the composition. The tablets, pills,capsules, troches and the like can contain any of the followingingredients, or compounds of a similar nature: a binder such asmicrocrystalline cellulose, gum tragacanth or gelatin; an excipient suchas starch or lactose, a disintegrating agent such as alginic acid,Primogel, or corn starch; a lubricant such as magnesium stearate orSterotes; a glidant such as colloidal silicon dioxide; a sweeteningagent such as sucrose or saccharin; or a flavoring agent such aspeppermint, methyl salicylate, or orange flavoring.

Pharmaceutical compositions suitable for injectable use include sterileaqueous solutions (where water soluble) or dispersions and sterilepowders for the extemporaneous preparation of sterile injectablesolutions or dispersion. For intravenous administration, suitablecarriers include physiological saline, bacteriostatic water, CremophorEL™ (BASF, Parsippany, N.J.) or phosphate buffered saline (PBS). In allcases, the composition must be sterile and should be fluid to the extentthat easy syringability exists. It should be stable under the conditionsof manufacture and storage and must be preserved against thecontaminating action of microorganisms such as bacteria and fungi. Thecarrier can be a solvent or dispersion medium containing, for example,water, ethanol, polyol (for example, glycerol, propylene glycol, andliquid polyethylene glycol, and the like), and suitable mixturesthereof. The proper fluidity can be maintained, for example, by the useof a coating such as lecithin, by the maintenance of the requiredparticle size in the case of dispersion and by the use of surfactants.Prevention of the action of microorganisms can be achieved by variousantibacterial and antifungal agents, for example, parabens,chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In manycases, it will be preferable to include isotonic agents, for example,sugars, polyalcohols such as mannitol, sorbitol, sodium chloride in thecomposition. Prolonged absorption of the injectable compositions can bebrought about by including in the composition an agent which delaysabsorption, for example, aluminum monostearate and gelatin.

Sterile injectable solutions can be prepared by incorporating the activecompound in the required amount in an appropriate solvent with one or acombination of ingredients enumerated above, as required, followed byfiltered sterilization. Generally, dispersions are prepared byincorporating the active compound into a sterile vehicle which containsa basic dispersion medium and the required other ingredients from thoseenumerated above. In the case of sterile powders for the preparation ofsterile injectable solutions, methods of preparation sometimes utilizedare vacuum drying and freeze-drying which yields a powder of the activeingredient plus any additional desired ingredient from a previouslysterile-filtered solution thereof.

For administration by inhalation, the compounds are delivered in theform of an aerosol spray from pressured container or dispenser whichcontains a suitable propellant, e.g., a gas such as carbon dioxide, or anebulizer.

Systemic administration can also be by transmucosal or transdermalmeans. For transmucosal or transdermal administration, penetrantsappropriate to the barrier to be permeated are used in the formulation.Such penetrants are generally known in the art, and include, forexample, for transmucosal administration, detergents, bile salts, andfusidic acid derivatives. Transmucosal administration can beaccomplished through the use of nasal sprays or suppositories. Fortransdermal administration, the active compounds are formulated intoointments, salves, gels, or creams as generally known in the art.Molecules can also be prepared in the form of suppositories (e.g., withconventional suppository bases such as cocoa butter and otherglycerides) or retention enemas for rectal delivery.

In one embodiment, active molecules are prepared with carriers that willprotect the compound against rapid elimination from the body, such as acontrolled release formulation, including implants and microencapsulateddelivery systems. Biodegradable, biocompatible polymers can be used,such as ethylene vinyl acetate, polyanhydrides, polyglycolic acid,collagen, polyorthoesters, and polylactic acid. Methods for preparationof such formulations will be apparent to those skilled in the art.Materials can also be obtained commercially from Alza Corporation andNova Pharmaceuticals, Inc. Liposomal suspensions (including liposomestargeted to infected cells with monoclonal antibodies to viral antigens)can also be used as pharmaceutically acceptable carriers. These can beprepared according to methods known to those skilled in the art, forexample, as described in U.S. Pat. No. 4,522,811.

It is advantageous to formulate oral or parenteral compositions indosage unit form for ease of administration and uniformity of dosage.Dosage unit form as used herein refers to physically discrete unitssuited as unitary dosages for the subject to be treated; each unitcontaining a predetermined quantity of active compound calculated toproduce the desired therapeutic effect in association with the requiredpharmaceutical carrier.

Toxicity and therapeutic efficacy of such compounds can be determined bystandard pharmaceutical procedures in cell cultures or experimentalanimals, e.g., for determining the LD₅₀ (the dose lethal to 50% of thepopulation) and the ED₅₀ (the dose therapeutically effective in 50% ofthe population). The dose ratio between toxic and therapeutic effects isthe therapeutic index and it can be expressed as the ratio LD₅₀/ED₅₀.Molecules which exhibit high therapeutic indices are preferred. Whilemolecules that exhibit toxic side effects may be used, care should betaken to design a delivery system that targets such compounds to thesite of affected tissue in order to minimize potential damage touninfected cells and, thereby, reduce side effects.

The data obtained from the cell culture assays and animal studies can beused in formulating a range of dosage for use in humans. The dosage ofsuch molecules lies preferably within a range of circulatingconcentrations that include the ED₅₀ with little or no toxicity. Thedosage may vary within this range depending upon the dosage formemployed and the route of administration utilized. For any moleculesused in the methods described herein, the therapeutically effective dosecan be estimated initially from cell culture assays. A dose may beformulated in animal models to achieve a circulating plasmaconcentration range that includes the IC₅₀ (i.e., the concentration ofthe test compound which achieves a half-maximal inhibition of symptoms)as determined in cell culture. Such information can be used to moreaccurately determine useful doses in humans. Levels in plasma may bemeasured, for example, by high performance liquid chromatography.

As defined herein, a therapeutically effective amount of protein orpolypeptide (i.e., an effective dosage) ranges from about 0.001 to 30mg/kg body weight, sometimes about 0.01 to 25 mg/kg body weight, oftenabout 0.1 to 20 mg/kg body weight, and more often about 1 to 10 mg/kg, 2to 9 mg/kg, 3 to 8 mg/kg, 4 to 7 mg/kg, or 5 to 6 mg/kg body weight. Theprotein or polypeptide can be administered one time per week for betweenabout 1 to 10 weeks, sometimes between 2 to 8 weeks, often between about3 to 7 weeks, and more often for about 4, 5, or 6 weeks. The skilledartisan will appreciate that certain factors may influence the dosageand timing required to effectively treat a subject, including but notlimited to the severity of the disease or disorder, previous treatments,the general health and/or age of the subject, and other diseasespresent. Moreover, treatment of a subject with a therapeuticallyeffective amount of a protein, polypeptide, or antibody can include asingle treatment or, preferably, can include a series of treatments.

For antibodies, a dosage of 0.1 mg/kg of body weight (generally 10 mg/kgto 20 mg/kg) is often utilized. If the antibody is to act in the brain,a dosage of 50 mg/kg to 100 mg/kg is often appropriate. Generally,partially human antibodies and fully human antibodies have a longerhalf-life within the human body than other antibodies. Accordingly,lower dosages and less frequent administration is often possible.Modifications such as lipidation can be used to stabilize antibodies andto enhance uptake and tissue penetration (e.g., into the brain). Amethod for lipidation of antibodies is described by Cruikshank et al.,J. Acquired Immune Deficiency Syndromes and Human Retrovirology 14:193(1997).

Antibody conjugates can be used for modifying a given biologicalresponse, the drug moiety is not to be construed as limited to classicalchemical therapeutic agents. For example, the drug moiety may be aprotein or polypeptide possessing a desired biological activity. Suchproteins may include, for example, a toxin such as abrin, ricin A,pseudomonas exotoxin, or diphtheria toxin; a polypeptide such as tumornecrosis factor, alpha-interferon, beta-interferon, nerve growth factor,platelet derived growth factor, tissue plasminogen activator; or,biological response modifiers such as, for example, lymphokines,interleukin-1 (“IL-1”), interleukin-2 (“IL-2”), interleukin-6 (“IL-6”),granulocyte macrophage colony stimulating factor (“GM-CSF”), granulocytecolony stimulating factor (“G-CSF”), or other growth factors.Alternatively, an antibody can be conjugated to a second antibody toform an antibody heteroconjugate as described by Segal in U.S. Pat. No.4,676,980.

For compounds, exemplary doses include milligram or microgram amounts ofthe compound per kilogram of subject or sample weight, for example,about 1 microgram per kilogram to about 500 milligrams per kilogram,about 100 micrograms per kilogram to about 5 milligrams per kilogram, orabout 1 microgram per kilogram to about 50 micrograms per kilogram. Itis understood that appropriate doses of a small molecule depend upon thepotency of the small molecule with respect to the expression or activityto be modulated. When one or more of these small molecules is to beadministered to an animal (e.g., a human) in order to modulateexpression or activity of a polypeptide or nucleic acid describedherein, a physician, veterinarian, or researcher may, for example,prescribe a relatively low dose at first, subsequently increasing thedose until an appropriate response is obtained. In addition, it isunderstood that the specific dose level for any particular animalsubject will depend upon a variety of factors including the activity ofthe specific compound employed, the age, body weight, general health,gender, and diet of the subject, the time of administration, the routeof administration, the rate of excretion, any drug combination, and thedegree of expression or activity to be modulated.

With regard to nucleic acid formulations, gene therapy vectors can bedelivered to a subject by, for example, intravenous injection, localadministration (see, e.g., U.S. Pat. No. 5,328,470) or by stereotacticinjection (see e.g., Chen et al., (1994) Proc. Natl. Acad. Sci. USA91:3054-3057). Pharmaceutical preparations of gene therapy vectors caninclude a gene therapy vector in an acceptable diluent, or can comprisea slow release matrix in which the gene delivery vehicle is imbedded.Alternatively, where the complete gene delivery vector can be producedintact from recombinant cells (e.g., retroviral vectors) thepharmaceutical preparation can include one or more cells which producethe gene delivery system. Examples of gene delivery vectors aredescribed herein.

Therapeutic Methods

A therapeutic formulation described above can be administered to asubject in need of a therapeutic for treating breast cancer. Therapeuticformulations can be administered by any of the paths described herein.With regard to both prophylactic and therapeutic methods of treatment,such treatments may be specifically tailored or modified, based onknowledge obtained from pharmacogenomic analyses described herein.

As used herein, the term “treatment” is defined as the application oradministration of a therapeutic formulation to a subject, or applicationor administration of a therapeutic agent to an isolated tissue or cellline from a subject with the purpose to cure, heal, alleviate, relieve,alter, remedy, ameliorate, improve or affect breast cancer, symptoms ofbreast cancer or a predisposition towards breast cancer. A therapeuticformulation includes, but is not limited to, small molecules, peptides,antibodies, ribozymes and antisense oligonucleotides. Administration ofa therapeutic formulation can occur prior to the manifestation ofsymptoms characteristic of breast cancer, such that breast cancer isprevented or delayed in its progression. The appropriate therapeuticcomposition can be determined based on screening assays describedherein.

As discussed, successful treatment of breast cancer can be brought aboutby techniques that serve to agonize target molecule expression orfunction, or alternatively, antagonize target molecule expression orfunction. These techniques include administration of modulators thatinclude, but are not limited to, small organic or inorganic molecules;antibodies (including, for example, polyclonal, monoclonal, humanized,anti-idiotypic, chimeric or single chain antibodies, and FAb, F(ab′)₂and FAb expression library fragments, scFV molecules, andepitope-binding fragments thereof); and peptides, phosphopeptides, orpolypeptides.

Further, antisense and ribozyme molecules that inhibit expression of thetarget gene can also be used to reduce the level of target geneexpression, thus effectively reducing the level of target gene activity.Still further, triple helix molecules can be utilized in reducing thelevel of target gene activity. Antisense, ribozyme and triple helixmolecules are discussed above. It is possible that the use of antisense,ribozyme, and/or triple helix molecules to reduce or inhibit mutant geneexpression can also reduce or inhibit the transcription (triple helix)and/or translation (antisense, ribozyme) of mRNA produced by normaltarget gene alleles, such that the concentration of normal target geneproduct present can be lower than is necessary for a normal phenotype.In such cases, nucleic acid molecules that encode and express targetgene polypeptides exhibiting normal target gene activity can beintroduced into cells via gene therapy method. Alternatively, ininstances in that the target gene encodes an extracellular polypeptide,it can be preferable to co-administer normal target gene polypeptideinto the cell or tissue in order to maintain the requisite level ofcellular or tissue target gene activity.

Another method by which nucleic acid molecules may be utilized intreating or preventing breast cancer is use of aptamer moleculesspecific for target molecules. Aptamers are nucleic acid moleculeshaving a tertiary structure which permits them to specifically bind toligands (see, e.g., Osborne, et al., Curr. Opin. Chem. Biol. (1): 5-9(1997); and Patel, D. J., Curr. Opin. Chem. Biol. June;1(1): 32-46(1997)).

Yet another method of utilizing nucleic acid molecules for breast cancertreatment is gene therapy, which can also be referred to as alleletherapy. Provided herein is a gene therapy method for treating breastcancer in a subject, which comprises contacting one or more cells in thesubject or from the subject with a nucleic acid having a firstnucleotide sequence. Genomic DNA in the subject comprises a secondnucleotide sequence having one or more polymorphic variations associatedwith breast cancer (e.g., the second nucleic acid is selected from FIGS.1A-1B or FIG. 2). The first and second nucleotide sequences typicallyare substantially identical to one another, and the first nucleotidesequence comprises fewer polymorphic variations associated with breastcancer than the second nucleotide sequence. The first nucleotidesequence may comprise a gene sequence that encodes a full-lengthpolypeptide or a fragment thereof. The subject is often a human. Alleletherapy methods often are utilized in conjunction with a method of firstdetermining whether a subject has genomic DNA that includes polymorphicvariants associated with breast cancer.

In another allele therapy embodiment, provided herein is a method whichcomprises contacting one or more cells in the subject or from thesubject with a polypeptide encoded by a nucleic acid having a firstnucleotide sequence. Genomic DNA in the subject comprises a secondnucleotide sequence having one or more polymorphic variations associatedwith breast cancer (e.g., the second nucleic acid is selected from FIGS.1A-1B or FIG. 2). The first and second nucleotide sequences typicallyare substantially identical to one another, and the first nucleotidesequence comprises fewer polymorphic variations associated with breastcancer than the second nucleotide sequence. The first nucleotidesequence may comprise a gene sequence that encodes a full-lengthpolypeptide or a fragment thereof. The subject is often a human.

For antibody-based therapies, antibodies can be generated that are bothspecific for target molecules and that reduce target molecule activity.Such antibodies may be administered in instances where antagonizing atarget molecule function is appropriate for the treatment of breastcancer.

In circumstances where stimulating antibody production in an animal or ahuman subject by injection with a target molecule is harmful to thesubject, it is possible to generate an immune response against thetarget molecule by use of anti-idiotypic antibodies (see, e.g., Herlyn,Ann. Med.;31(1): 66-78 (1999); and Bhattacharya-Chatterjee & Foon,Cancer Treat. Res.; 94: 51-68 (1998)). Introducing an anti-idiotypicantibody to a mammal or human subject often stimulates production ofanti-anti-idiotypic antibodies, which typically are specific to thetarget molecule. Vaccines directed to breast cancer also may begenerated in this fashion.

In instances where the target molecule is intracellular and wholeantibodies are used, internalizing antibodies sometimes are utilized.Lipofectin or liposomes can be used to deliver the antibody or afragment of the Fab region that binds to the target antigen into cells.Where fragments of the antibody are used, the smallest inhibitoryfragment that binds to the target antigen sometimes is utilized. Forexample, peptides having an amino acid sequence corresponding to the Fvregion of the antibody can be used. Alternatively, single chainneutralizing antibodies that bind to intracellular target antigens canalso be administered. Such single chain antibodies can be administered,for example, by expressing nucleotide sequences encoding single-chainantibodies within the target cell population (see, e.g., Marasco et al.,Proc. Natl. Acad. Sci. USA 90: 7889-7893 (1993)).

Modulators can be administered to a patient at therapeutically effectivedoses to treat breast cancer. A therapeutically effective dose refers toan amount of the modulator sufficient to result in amelioration ofsymptoms of breast cancer. Toxicity and therapeutic efficacy ofmodulators can be determined by standard pharmaceutical procedures incell cultures or experimental animals, e.g., for determining the LD₅₀(the dose lethal to 50% of the population) and the ED₅₀ (the dosetherapeutically effective in 50% of the population). The dose ratiobetween toxic and therapeutic effects is the therapeutic index and itcan be expressed as the ratio LD₅₀/ED₅₀. Modulators that exhibit largetherapeutic indices are preferred. While modulators that exhibit toxicside effects can be used, care should be taken to design a deliverysystem that targets such molecules to the site of affected tissue inorder to minimize potential damage to uninfected cells, thereby reducingside effects.

Data obtained from cell culture assays and animal studies can be used informulating a range of dosages for use in humans. The dosage of suchcompounds lies preferably within a range of circulating concentrationsthat include the ED₅₀ with little or no toxicity. The dosage can varywithin this range depending upon the dosage form employed and the routeof administration utilized. For any compound used in the methodsdescribed herein, the therapeutically effective dose can be estimatedinitially from cell culture assays. A dose can be formulated in animalmodels to achieve a circulating plasma concentration range that includesthe IC₅₀ (i.e., the concentration of the test compound that achieves ahalf-maximal inhibition of symptoms) as determined in cell culture. Suchinformation can be used to more accurately determine useful doses inhumans. Levels in plasma can be measured, for example, by highperformance liquid chromatography.

Another example of effective dose determination for an individual is theability to directly assay levels of “free” and “bound” compound in theserum of the test subject. Such assays may utilize antibody mimicsand/or “biosensors” that have been created through molecular imprintingtechniques. Molecules that modulate target molecule activity are used asa template, or “imprinting molecule”, to spatially organizepolymerizable monomers prior to their polymerization with catalyticreagents. The subsequent removal of the imprinted molecule leaves apolymer matrix which contains a repeated “negative image” of thecompound and is able to selectively rebind the molecule under biologicalassay conditions. A detailed review of this technique can be seen inAnsell et al, Current Opinion in Biotechnology 7: 89-94 (1996) and inShea, Trends in Polymer Science 2: 166-173 (1994). Such “imprinted”affinity matrixes are amenable to ligand-binding assays, whereby theimmobilized monoclonal antibody component is replaced by anappropriately imprinted matrix. An example of the use of such matrixesin this way can be seen in Vlatakis, et al., Nature 361: 645-647 (1993).Through the use of isotope-labeling, the “free” concentration ofcompound which modulates target molecule expression or activity readilycan be monitored and used in calculations of IC₅₀. Such “imprinted”affinity matrixes can also be designed to include fluorescent groupswhose photon-emitting properties measurably change upon local andselective binding of target compound. These changes readily can beassayed in real time using appropriate fiberoptic devices, in turnallowing the dose in a test subject to be quickly optimized based on itsindividual IC₅₀. An example of such a “biosensor” is discussed in Krizet al., Analytical Chemistry 67: 2142-2144 (1995).

The examples set forth below are intended to illustrate but not limitthe invention.

EXAMPLES

In the following studies a group of subjects were selected according tospecific parameters pertaining to breast cancer. Nucleic acid samplesobtained from individuals in the study group were subjected to geneticanalyses that identified associations between breast cancer and certainpolymorphic variants in human genomic DNA. Methods are described forproducing target polypeptides encoded by the nucleic acids of FIGS.1A-1B or FIG. 2 in vitro or in vivo, which can be utilized in methodsthat screen test molecules for those that interact with targetpolypeptides. Test molecules identified as being interactors with targetpolypeptides can be screened further as breast cancer therapeutics.

Example 1 Samples and Pooling Strategies

Sample Selection

Blood samples were collected from individuals diagnosed with breastcancer, which were referred to case samples. Also, blood samples werecollected from individuals not diagnosed with breast cancer or any formof cancer or a history of breast cancer; these samples served as genderand age-matched controls. All of the samples were of German/Germandescent. A database was created that listed all phenotypic traitinformation gathered from individuals for each case and control sample.Genomic DNA was extracted from each of the blood samples for geneticanalyses.

DNA Extraction from Blood Samples

Six to ten milliliters of whole blood was transferred to a 50 ml tubecontaining 27 ml of red cell lysis solution (RCL). The tube was inverteduntil the contents were mixed. Each tube was incubated for 10 minutes atroom temperature and inverted once during the incubation. The tubes werethen centrifuged for 20 minutes at 3000×g and the supernatant wascarefully poured off. 100-200 μl of residual liquid was left in the tubeand was pipetted repeatedly to resuspend the pellet in the residualsupernatant. White cell lysis solution (WCL) was added to the tube andpipetted repeatedly until completely mixed. While no incubation wasnormally required, the solution was incubated at 37° C. or roomtemperature if cell clumps were visible after mixing until the solutionwas homogeneous. 2 ml of protein precipitation was added to the celllysate. The mixtures were vortexed vigorously at high speed for 20 secto mix the protein precipitation solution uniformly with the celllysate, and then centrifuged for 10 minutes at 3000×g. The supernatantcontaining the DNA was then poured into a clean 15 ml tube, whichcontained 7 ml of 100% isopropanol. The samples were mixed by invertingthe tubes gently until white threads of DNA were visible. Samples werecentrifuged for 3 minutes at 2000×g and the DNA was visible as a smallwhite pellet. The supernatant was decanted and 5 ml of 70% ethanol wasadded to each tube. Each tube was inverted several times to wash the DNApellet, and then centrifuged for 1 minute at 2000×g. The ethanol wasdecanted and each tube was drained on clean absorbent paper. The DNA wasdried in the tube by inversion for 10 minutes, and then 1000 μl of 1×TEwas added. The size of each sample was estimated, and less TE buffer wasadded during the following DNA hydration step if the sample was smaller.The DNA was allowed to rehydrate overnight at room temperature, and DNAsamples were stored at 2-8° C.

DNA was quantified by placing samples on a hematology mixer for at least1 hour. DNA was serially diluted (typically 1:80, 1:160, 1:320, and1:640 dilutions) so that it would be within the measurable range ofstandards. 125 μl of diluted DNA was transferred to a clear U-bottommicrotitre plate, and 125 μl of 1×TE buffer was transferred into eachwell using a multichannel pipette. The DNA and 1×TE were mixed byrepeated pipetting at least 15 times, and then the plates were sealed.50 μl of diluted DNA was added to wells A5-H12 of a black flat bottommicrotitre plate. Standards were inverted six times to mix them, andthen 50 μl of 1×TE buffer was pipetted into well A1, 1000 ng/ml ofstandard was pipetted into well A2, 500 ng/ml of standard was pipettedinto well A3, and 250 ng/ml of standard was pipetted into well A4.PicoGreen (Molecular Probes, Eugene, Oreg.) was thawed and freshlydiluted 1:200 according to the number of plates that were beingmeasured. PicoGreen was vortexed and then 50 μl was pipetted into allwells of the black plate with the diluted DNA. DNA and PicoGreen weremixed by pipetting repeatedly at least 10 times with the multichannelpipette. The plate was placed into a. Fluoroskan Ascent Machine(microplate fluorometer produced by Labsystems) and the samples wereallowed to incubate for 3 minutes before the machine was run usingfilter pairs 485 nm excitation and 538 nm emission wavelengths. Sampleshaving measured DNA concentrations of greater than 450 ng/μl werere-measured for conformation. Samples having measured DNA concentrationsof 20 ng/μl or less were re-measured for confirmation.

Pooling Strategies

Samples were placed into one of two groups based on disease status. Thetwo groups were female case samples and female control samples. A selectset of samples from each group were utilized to generate pools, and onepool was created for each group. Each individual sample in a pool wasrepresented by an equal amount of genomic DNA. For example, where 25 ngof genomic DNA was utilized in each PCR reaction and there were 200individuals in each pool, each individual would provide 125 pg ofgenomic DNA. Inclusion or exclusion of samples for a pool was based uponthe following criteria: the sample was derived from an individualcharacterized as Caucasian; the sample was derived from an individual ofGerman paternal and maternal descent; the database included relevantphenotype information for the individual; case samples were derived fromindividuals diagnosed with breast cancer; control samples were derivedfrom individuals free of cancer and no family history of breast cancer;and sufficient genomic DNA was extracted from each blood sample for allallelotyping and genotyping reactions performed during the study.Phenotype information included pre- or post-menopausal, familialpredisposition, country or origin of mother and father, diagnosis withbreast cancer (date of primary diagnosis, age of individual as ofprimary diagnosis, grade or stage of development, occurrence ofmetastases, e.g., lymph node metastases, organ metastases), condition ofbody tissue (skin tissue, breast tissue, ovary tissue, peritoneum tissueand myometrium), method of treatment (surgery, chemotherapy, hormonetherapy, radiation therapy). Samples that met these criteria were addedto appropriate pools based on gender and disease status.

The selection process yielded the pools set forth in Table 1, which wereused in the studies that follow: TABLE 1 Female case Female control Poolsize 272 276 (Number) Pool Criteria case control (ex: case/control) MeanAge 59.6 55.4 (ex: years)

Example 2 Association of Polymorphic Variants with Breast Cancer

A whole-genome screen was performed to identify particular SNPsassociated with occurrence of breast cancer. As described in Example 1,two sets of samples were utilized, which included samples from femaleindividuals having breast cancer (breast cancer cases) and samples fromfemale individuals not having cancer (female controls). The initialscreen of each pool was performed in an allelotyping study, in whichcertain samples in each group were pooled. By pooling DNA from eachgroup, an allele frequency for each SNP in each group was calculated.These allele frequencies were then compared to one another. ParticularSNPs were considered as being associated with breast cancer when allelefrequency differences calculated between case and control pools werestatistically significant. SNP disease association results obtained fromthe allelotyping study were then validated by genotyping each associatedSNP across all samples from each pool. The results of the genotypingwere then analyzed, allele frequencies for each group were calculatedfrom the individual genotyping results, and a p value was calculated todetermine whether the case and control groups had statisticallysignificantly differences in allele frequencies for a particular SNP.When the genotyping results agreed with the original allelotypingresults, the SNP disease association was considered validated at thegenetic level.

SNP Panel Used for Genetic Analyses

A whole-genome SNP screen began with an initial screen of approximately25,000 SNPs over each set of disease and control samples using a poolingapproach. The pools studied in the screen are described in Example 1.The SNPs analyzed in this study were part of a set of 25,488 SNPsconfirmed as being statistically polymorphic as each is characterized ashaving a minor allele frequency of greater than 10%. The SNPs in the setreside in genes or in close proximity to genes, and many reside in geneexons. Specifically, SNPs in the set are located in exons, introns, andwithin 5,000 base-pairs upstream of a transcription start site of agene. In addition, SNPs were selected according to the followingcriteria: they are located in ESTs; they are located in Locuslink orEnsembl genes; and they are located in Genomatix promoter predictions.SNPs in the set were also selected on the basis of even spacing acrossthe genome, as depicted in Table 2. TABLE 2 General Statistics SpacingStatistics Total # of SNPs   25,488 Median   37,058 bp # of ExonicSNPs >4,335 (17%) Minimum*    1,000 bp # SNPs with refSNP ID 20,776(81%) Maximum* 3,000,000 bp Gene Coverage >10,000 Mean   122,412 bpChromosome Coverage All Std Deviation   373,325 bp*Excludes outliersAllelotyping and Genotyping Results

The genetic studies summarized above and described in more detail belowidentified allelic variants associated with breast cancer, which aresummarized in FIGS. 1A-1B.

Assay for Verifying, Allelotyping, and Genotyping SNPs

A MassARRAY™ system (Sequenom, Inc.) was utilized to perform SNPgenotyping in a high-throughput fashion. This genotyping platform wascomplemented by a homogeneous, single-tube assay method (hME™ orhomogeneous MassEXTEND™ (Sequenom, Inc.)) in which two genotypingprimers anneal to and amplify a genomic target surrounding a polymorphicsite of interest. A third primer (the MassEXTEND™ primer), which iscomplementary to the amplified target up to but not including thepolymorphism, was then enzymatically extended one or a few bases throughthe polymorphic site and then terminated.

For each polymorphism, SpectroDESIGNER™ software (Sequenom, Inc.) wasused to generate a set of PCR primers and a MassEXTEND™ primer whichwhere used to genotype the polymorphism. Other primer design softwarecould be used or one of ordinary skill in the art could manually designprimers based on his or her knowledge of the relevant factors andconsiderations in designing such primers. Table 3 shows PCR primers andTable 4 shows extension primers used for analyzing polymorphisms. Theinitial PCR amplification reaction was performed in a 5 μl total volumecontaining 1×PCR buffer with 1.5 mM MgCl₂ (Qiagen), 200 μM each of dATP,dGTP, dCTP, dTTP (Gibco-BRL), 2.5 ng of genomic DNA, 0.1 units ofHotStar DNA polymerase (Qiagen), and 200 nM each of forward and reversePCR primers specific for the polymorphic region of interest. TABLE 3 PCRPrimers SNP Reference Forward PCR primer Reverse PCR primer 911229TGCAAGAATGACACTCTAGC ATCATTCCCATTACTGATGG 1020445 TGGGCAGGAATAAGGCAAACCCAATAGGTTGCCTTTCCTG 161446 TAGTCTTGAAGGCCTTTGAC AGATACGTCCCATCAATACG AACTTTTTCCTTTCCAGCAAGG TCCTGATTTGTTTCCAGTCC 1868220 ATGAGCCAGTACAGAATTTGGCCCATCCATTCAAAGATTT 868767 TTTTGCCTCTTCTCTTCCTC GAAGGCAGAAATAGTCATGC872478 GCAAATCTTTACTTACAGGAG GGGAATAATCCTCTCAATTAG 313578CCATCCACCTCCAACTTTTC GGCACTTCCAACACAACTTG 1548315 CCATTCCTGCTACGTGATACGTAGCCCTTTGCTACATGTG 32939 AGGACCAGACTTAGCTTCAG TTTAAGACTTGAGGTGTGCC676015 AAAAAACCGAAGTGTGGGAG GTATGTAAGCATGATTGGTC 325447ACAAGGACATAAAAGGATGG TGACAGTGTTGAGCACTTTC 1044011 GGATATCAGTGTTCCTCATGAAGCACAGTGGACATGTTTC 12981 TTCACTTCAGTGCTACAGCAAAA AGGGCAGAGATGGAACAATG803715 TCCATATATGGAAAGGTGGG AACTCATTGAAGGCATAGGC 876129AACCCCACAAATACCAAAAG GTCTGTAGGATGGAAATGTG 1627521 GAGGTGGATTTGGGCATATGACCCTTCGAACATCTCTTCC 1323140 TGATTGCATCAGGAACTGAG GTAGCCACAAAGAGATACAG1112370 TGTACTGGTTCTGGGTATAG CGATACTTGTACCTGTATGG 12465TTCTAATGCCATGGGTTTGG AATCCACCTGAAAGTGCTGC 1054745 GACTTTTAGGTCTGAGTTGGCTTCCTCTAGCAGTGTATTTC 841229 AGAGCAAAGCCATGCCAGAG CTCCCAACAGCTCAGCTCTC1344533 ATGAAGTGGCAGTGATTTGC ATGATGCAGGGCTCTAGAAG 769425TTGGAAATGCCAATGCCCTC TAGAAGTTGGCCACTTCCTG 8196 GATGGCCTCCAGAGGAGCTCTCACCAGAGACACCAGTCC 492170 GGTAAATGAAGCAGCTACAG GTTTTGGGTATTTGTTGGGG476476 TGTGGCAGAGGTCTAAAGAA AGGCATCCTCTTTGTCTTTG AB AATGTGAAAGCCTCCTGAACGGGCAAGTTTTATGCCATG 760427 AGACAGCCTCCACACTGTGTG AGACTTCCCCTCTCAGAGTG896169 TTCAGCTTTAGGGACCATGC TCTTCCCAAGGCTAGTTTCC 536161GCCTGTGGTCAGAAGAAAAC AATAAGGCGGCTCCACAAAC 487105 TTCTATTGGAATCTCCACGGCCGGCCGTAGTTATTGTTTT 220479 ACGTTGGATGTGCACAAGACCTGACGTTGGATGTTCTGGAGCTCAAA CAGCCTC CACGGC 892005 ACGTTGGATGGGGTTTAGGAAAAACGTTGGATGCAGTGCACTGATA CAAACCT ACTATTC 3088091 ACGTTGGATGATTGCCACACAGTTACGTTGGATGCACCCATATCTCAT AACTGG CAGGAG

Samples were incubated at 95° C. for 15 minutes, followed by 45 cyclesof 95° C. for 20 seconds, 56° C. for 30 seconds, and 72° C. for 1minute, finishing with a 3 minute final extension at 72° C. Followingamplification, shrimp alkaline phosphatase (SAP) (0.3 units in a 2 μlvolume) (Amersham Pharmacia) was added to each reaction (total reactionvolume was 7 μl) to remove any residual dNTPs that were not consumed inthe PCR step. Samples were incubated for 20 minutes at 37° C., followedby 5 minutes at 85° C. to denature the SAP.

Once the SAP reaction was complete, a primer extension reaction wasinitiated by adding a polymorphism-specific MassEXTEND™ primer cocktailto each sample. Each MassEXTEND™ cocktail included a specificcombination of dideoxynucleotides (ddNTPs) and deoxynucleotides (dNTPs)used to distinguish polymorphic alleles from one another. Methods forverifying, allelotyping and genotyping SNPs are disclosed, for example,in U.S. Pat. No. 6,258,538, the content of which is hereby incorporatedby reference. In Table. 4, ddNTPs are shown and the fourth nucleotidenot shown is the dNTP. TABLE 4 Extension Primers SNP TerminationReference Extend Probe Mix 911229 TGACACTCTAGCAATTTTATTAAT ACT 1020445AGGCAAACAAACACTTCATGC ACG 161446 ACCCTTCATGCTGAAAACTCT ACG AACCTTTCCAGCAAGGCTACAC ACG 1868220 GGCAAGTATAATCTGCCTGATA ACT 868767CAGCTTTTCTCAAAGGGTCC ACG 872478 TTTACTTACAGGAGAGGAAA ACT 313578TTGGTAATGTTGACATTTGCTG ACT 1548315 TGCTACGTGATACTCAACTGATA ACT 32939CAGCCAAGAGCAAGCTTCC ACT 676015 GAGCAGAGGGAGAGAAAAAG ACG 325447GGACATAAAAGGATGGGAAAAA ACT 1044011 GCTTCCAGATTTGTAAGATT ACT 12981CACACAGAATTCACTCTT ACG 803715 AAAGTCCAGATAGGAGGTATCT ACT 876129CCACAAATACCAAAAGACCTACC ACT 1627521 GCATATGCATGAAAAAACTTTCT ACT 1323140GAACTGAGCCCACATCCTCT ACT 1112370 GGAAAAAAGTCAGTTTAACCAAA ACG 12465TGGAGTCGGAACACTTTT ACT 1054745 TTGGTCCATTAGGGAATTAGA ACG 841229GAGAGACACGGTCAGGGG ACG 1344533 TGCAGGGACTGTGACAAATC ACT 769425TGCCCTCCCCCACACTCT ACT 8196 AGGAGCAGCTGCAGGGCA ACG 492170GAAGCAGCTACAGAAAGCTTTT ACG 476476 TGGCAGAGCTCTAAAGAAATGACT ACG ABCAGGTCCTGGAATAGAGAAC ACT 760427 CAGAGTGGGACTCCTTGCT ACT 896169GGCTCATCTATCCCTTGCC CGT 536161 AACTCTGCAACCTGATCAC CGT 487105GAATCTCCACGGAGTTCAGA ACT 220479 ACCGTGGAGCCCCCGCGA ACG 892005CATGTTTTCAAAAACTAAGTTACT ACG 3088091 GGTTATGATCACCACGTAC ACT

The MassEXTEND™ reaction was performed in a total volume of 9 μl, withthe addition of 1× ThermoSequenase buffer, 0.576 units ofThermoSequenase (Amersham Pharmacia), 600 nM MassEXTEND™ primer, 2 mM ofddATP and/or ddCTP and/or ddGTP and/or ddTTP, and 2 mM of dATP or dCTPor dTTP. The deoxy nucleotide (dNTP) used in the assay normally wascomplementary to the nucleotide at the polymorphic site in the amplicon.Samples were incubated at 94° C. for 2 minutes, followed by 55 cycles of5 seconds at 94° C., 5 seconds at 52° C., and 5 seconds at 72° C.

Following incubation, samples were desalted by adding 16 μl of water(total reaction volume was 25 μl), 3 mg of SpectroCLEAN™ sample cleaningbeads (Sequenom, Inc.) and allowed to incubate for 3 minutes withrotation. Samples were then robotically dispensed using a piezoelectricdispensing device (SpectroJET™ (Sequenom, Inc.)) onto either 96-spot or384-spot silicon chips containing a matrix that crystallized each sample(SpectroCHIP™ (Sequenom, Inc.)). Subsequently, MALDI-TOF massspectrometry (Biflex and Autoflex MALDI-TOF mass spectrometers (BrukerDaltonics) can be used) and SpectroTYPER RT™ software (Sequenom, Inc.)were used to analyze and interpret the SNP genotype for each sample.

Genetic Analysis

Minor allelic frequencies for the polymorphisms set forth in FIGS. 1A-1Bwere verified as being 10% or greater using the extension assaydescribed above in a group of samples isolated from 92 individualsoriginating from the state of Utah in the United States, Venezuela andFrance (Coriell cell repositories).

Table 5 shows allelotyping results in female breast cancer and femalecontrol pools. Allele frequency is noted in the second and third columnsfor breast cancer pools and control pools, respectively, and the alleleindicated in bold type is the dominant allele. Genotyping results areshown for female pools in Table 6. In the subsequent tables, “AF” refersto allelic frequency; and “F case” and “F control” refer to female caseand female control groups, respectively. TABLE 5 Allelotyping Results AFAF SNP Reference M case M control p-value 911229 C = 0.685 C = 0.7760.0012 G = 0.315 G = 0.224 1020445 T = 0.507 T = 0.517 0.7483 C = 0.493C = 0.483 161446 C = 0.657 C = 0.737 0.0043 T = 0.343 T = 0.263 AA T =0.734 T = 0.663 0.0114 C = 0.266 C = 0.337 1868220 G = 0.508 G = 0.6420.0001 A = 0.492 A = 0.358 AB A = 0.792 A = 0.864 0.0026 G = 0.208 G =0.136 872478 C = 0.857 C = 0.785 0.0026 G = 0.143 G = 0.215 313578 G =0.726 G = 0.803 0.0037 C = 0.274 C = 0.197 1548315 G = 0.651 G = 0.7200.0146 T = 0.349 T = 0.280 32939 G = 0.576 G = 0.679 0.0007 A = 0.424 A= 0.321 676015 C = 0.696 C = 0.627 0.0166 T = 0.304 T = 0.373 325447 C =0.568 C = 0.478 0.0034 A = 0.432 A = 0.522 1044011 G = 0.571 G = 0.6520.0071 T = 0.429 T = 0.348 12981 T = 0.868 T = 0.784 0.0005 C = 0.132 C= 0.216 803715 G = 0.880 G = 0.806 0.0015 A = 0.120 A = 0.194 876129 C =0.634 C = 0.725 0.0017 G = 0.366 G = 0.275 1627521 T = 0.702 T = 0.6300.0132 C = 0.298 C = 0.370 1323140 G = 0.678 G = 0.599 0.0073 A = 0.322A = 0.401 1112370 T = 0.543 T = 0.617 0.0145 C = 0.457 C = 0.383 12465 G= 0.849 G = 0.779 0.0040 A = 0.151 A = 0.221 1054745 A = 0.793 A = 0.7180.0051 G = 0.207 G = 0.282 841229 T = 0.530 T = 0.443 0.0048 C = 0.470 C= 0.557 1344533 G = 0.682 G = 0.580 0.0009 A = 0.318 A = 0.420 769425 G= 0.723 G = 0.805 0.0020 T = 0.277 T = 0.195 8196 T = 0.633 T = 0.5540.0090 C = 0.367 C = 0.446 492170 G = 0.782 G = 0.715 0.0128 A = 0.218 A= 0.285 476476 T = 0.418 T = 0.511 0.0027 C = 0.582 C = 0.489 760427 G =0.661 G = 0.818 0.0000 A = 0.339 A = 0.182 896169 A = 0.539 A = 0.6210.0072 C = 0.461 C = 0.379 536161 A = 0.663 A = 0.589 0.0117 T = 0.337 T= 0.411 487105 G = 0.732 G = 0.857 0.0000 A = 0.268 A = 0.143

Genotyping results are shown for breast cancer pools and control poolsin Tables 6A and 6B. Table 6A shows the original genotyping results andTable 6B shows the genotype results re-analyzed to remove duplicateindividuals from the cases and controls (i.e., individuals who wereerroneously included more than once as either cases or controls).Therefore, Table 6B represents a more accurate measure of the allelefrequencies for this particular SNP.

In the subsequent tables, “AF” refers to allelic frequency. Particularlysignificant associations with breast cancer are indicated by acalculated p-value of less than 0.05 for genotype results, which are setforth in bold text. TABLE 6A Original Genotyping Results SNP ReferenceAF case AF control p-value 911229 C = 0.830 C = 0.888 0.0075 G = 0.170 G= 0.112 1020445 T = 0.650 T = 0.702 0.0676 C = 0.350 C = 0.298 161446 C= 0.672 C = 0.753 0.0039 T = 0.328 T = 0.247 AA T = 0.769 T = 0.6940.0061 C = 0.231 C = 0.306 1868220 G = 0.589 G = 0.694 0.0006 A = 0.411A = 0.306 868767 A = 0.857 A = 0.917 0.0027 G = 0.143 G = 0.083 872478 C= 0.822 C = 0.742 0.0021 G = 0.178 G = 0.258 313578 G = 0.846 G = 0.9010.0076 C = 0.154 C = 0.099 1548315 G = 0.735 G = 0.789 0.0435 T = 0.265T = 0.211 32939 G = 0.731 G = 0.797 0.0112 A = 0.269 A = 0.203 676015 C= 0.687 C = 0.600 0.0035 T = 0.313 T = 0.400 325447 C = 0.678 C = 0.5900.0036 A = 0.322 A = 0.410 1044011 G = 0.651 G = 0.747 0.0065 T = 0.349T = 0.253 12981 T = 0.912 T = 0.870 0.0252 C = 0.088 C = 0.130 803715 G= 0.916 G = 0.877 0.0372 A = 0.084 A = 0.123 876129 C = 0.610 C = 0.6980.0029 G = 0.390 G = 0.302 1627521 T = 0.630 T = 0.555 0.0131 C = 0.370C = 0.445 1323140 G = 0.765 G = 0.694 0.0099 A = 0.235 A = 0.306 1112370T = 0.603 T = 0.695 0.0023 C = 0.397 C = 0.305 12465 G = 0.920 G = 0.8750.0137 A = 0.080 A = 0.125 1054745 A = 0.871 A = 0.805 0.0039 G = 0.129G = 0.195 841229 T = 0.511 T = 0.420 0.0029 C = 0.489 C = 0.580 1344533G = 0.776 G = 0.700 0.0050 A = 0.224 A = 0.300 769425 G = 0.882 G =0.925 0.0551 T = 0.118 T = 0.075 8196 T = 0.740 T = 0.673 0.0169 C =0.260 C = 0.327 492170 G = 0.684 G = 0.622 0.0305 A = 0.316 A = 0.378476476 T = 0.515 T = 0.577 0.0381 C = 0.485 C = 0.423 AB G = 0.898 G =0.850 0.0189 A = 0.102 A = 0.150 760427 G = 0.858 G = 0.899 0.0389 A =0.142 A = 0.101 896169 A = 0.697 A = 0.762 0.0158 C = 0.303 C = 0.238536161 A = 0.744 A = 0.688 0.0411 T = 0.256 T = 0.312 487105 G = 0.867 G= 0.915 0.0111 A = 0.133 A = 0.085 220479 G = 0.856 G = 0.812 0.0518 A =0.144 A = 0.188 892005 G = 0.250 G = 0.190 0.0237 A = 0.750 A = 0.8103088091 A = 0.250 A = 0.190 0.0157 G = 0.750 G = 0.810

In table 6B, the allele frequency for the A2 allele is noted in thefourth and fifth columns for breast cancer pools and control pools,respectively, where “AF” is allele frequency. The allele frequency forthe A1 allele can be easily calculated by subtracting the A2 allelefrequency from 1 (A1 AF=1−A2 AF). For example, the SNP in row 2 of Table6B (rs1671152) has the following case and control allele frequencies:case A1 (T)=0.143; case A2 (G)=0.857; control A1 (T)=0.190; and controlA2 (G)=0.810, where the nucleotide is provided in paranthesis. TABLE 6BRe-analyzed Genotyping Results SNP Reference A1 A2 A2 AF Case A2 AFControl p-value OR 220479 G A 0.143 0.189 0.0517 0.72 911229 G C 0.8190.885 0.0039 0.59 1020445 C T 0.648 0.705 0.0523 0.77 161446 C T 0.3840.293 0.0024 1.50 AA C T 0.782 0.698 0.0023 1.56 1868220 A G 0.591 0.6970.0004 0.63 868767 G A 0.869 0.905 0.0704 0.70 872478 C G 0.184 0.2660.0020 0.62 313578 C G 0.843 0.896 0.0123 0.62 1548315 T G 0.738 0.7850.0805 0.77 32939 A G 0.740 0.797 0.0304 0.72 892005 G A 0.750 0.8320.0040 0.60 676015 C T 0.316 0.392 0.0124 0.72 325447 A C 0.677 0.5840.0026 1.49 896169 C A 0.697 0.757 0.0339 0.74 1044011 T G 0.653 0.7220.0177 0.72 536161 T A 0.743 0.689 0.0591 1.30 12981 C T 0.916 0.8660.0117 1.68 803715 A G 0.919 0.875 0.0227 1.61 876129 C G 0.378 0.3030.0122 1.40 487105 A G 0.864 0.917 0.0077 0.58 1627521 T C 0.362 0.4520.0043 0.69 1323140 A G 0.765 0.685 0.0055 1.49 1112370 C T 0.607 0.6940.0042 0.68 12465 A G 0.928 0.876 0.0064 1.81 1054745 G A 0.866 0.8010.0061 1.60 3088091 A G 0.758 0.817 0.0229 0.70 841229 C T 0.523 0.4190.0009 1.52 1344533 A G 0.776 0.703 0.0093 1.46 769425 T G 0.883 0.9180.0628 0.67 8196 C T 0.747 0.681 0.0216 1.38 492170 G A 0.326 0.3750.1060 0.81 476476 C T 0.514 0.577 0.0461 0.78 AB A G 0.894 0.839 0.01241.61 760427 A G 0.859 0.899 0.0494 0.68

The last column in Table 6B is labeled “OR,” which is an abbreviationfor “odds ratio.” An odds ratio is an unbiased estimate of relative riskwhich can be obtained from most case-control studies. Relative risk (RR)is an estimate of the likelihood of disease in the exposed group(susceptibility allele or genotype carriers) compared to the unexposedgroup (not carriers). It can be calculated by the following equation:RR=IA/IaIA is the incidence of disease in the A carriers and Ia is the incidenceof disease in the non-carriers.

-   -   RR>1 indicates the A allele increases disease susceptibility.    -   RR<1 indicates the a allele increases disease susceptibility.        For example, RR=1.5 indicates that carriers of the A allele have        1.5 times the risk of disease than non-carriers, i.e., 50% more        likely to get the disease.

Case-control studies do not allow the direct estimation of IA and Ia,therefore relative risk cannot be directly estimated. However, the oddsratio (OR) can be calculated using the following equation:OR=(nDAnda)/(ndAnDa)=pDA(1−pdA)/pdA(1−pDA), orOR=((case f)/(1−case f))/((control f)/(1−control f)), wheref=susceptibility allele frequency.

An odds ratio can be interpreted in the same way a relative risk isinterpreted and can be directly estimated using the data fromcase-control studies, i.e., case and control allele frequencies. Thehigher the odds ratio value, the larger the effect that particularallele has on the development of breast cancer. Possessing an alleleassociated with a relatively high odds ratio translates to having ahigher risk of developing or having breast cancer.

All of the single marker alleles set forth in FIGS. 1A-1B wereconsidered validated, since the genotyping data agreed with theallelotyping data and each SNP significantly associated with breastcancer. Particularly significant associations with breast cancer areindicated by a calculated p-value of less than 0.02 for allelotyperesults and a calculated p-value of less than 0.05 for genotype results,which are set forth in bold text.

Example 3 In Vitro Production of Target Polypeptides

cDNA is cloned into a pIVEX 2.3-MCS vector (Roche Biochem) using adirectional cloning method. A cDNA insert is prepared using PCR withforward and reverse primers having 5′ restriction site tags (in frame)and 5-6 additional nucleotides in addition to 3′ gene-specific portions,the latter of which is typically about twenty to about twenty-five basepairs in length. A Sal I restriction site is introduced by the forwardprimer and a Sma I restriction site is introduced by the reverse primer.The ends of PCR products are cut with the corresponding restrictionenzymes (i.e., Sal I and Sma I) and the products are gel-purified. ThepIVEX 2.3-MCS vector is linearized using the same restriction enzymes,and the fragment with the correct sized fragment is isolated bygel-purification. Purified PCR product is ligated into the linearizedpIVEX 2.3-MCS vector and E. coli cells transformed for plasmidamplification. The newly constructed expression vector is verified byrestriction mapping and used for protein production.

E. coli lysate is reconstituted with 0.25 ml of Reconstitution Buffer,the Reaction Mix is reconstituted with 0.8 ml of Reconstitution Buffer;the Feeding Mix is reconstituted with 10.5 ml of Reconstitution Buffer;and the Energy Mix is reconstituted with 0.6 ml of ReconstitutionBuffer. 0.5 ml of the Energy Mix was added to the Feeding Mix to obtainthe Feeding Solution. 0.75 ml of Reaction Mix, 50 μl of Energy Mix, and10 μg of the template DNA is added to the E. coli lysate.

Using the reaction device (Roche Biochem), 1 ml of the Reaction Solutionis loaded into the reaction compartment. The reaction device is turnedupside-down and 10 ml of the Feeding Solution is loaded into the feedingcompartment. All lids are closed and the reaction device is loaded intothe RTS500 instrument. The instrument is run at 30° C. for 24 hours witha stir bar speed of 150 rpm. The pIVEX 2.3 MCS vector includes anucleotide sequence that encodes six consecutive histidine amino acidson the C-terminal end of the target polypeptide for the purpose ofprotein purification. Target polypeptide is purified by contacting thecontents of reaction device with resin modified with Ni²⁺ ions. Targetpolypeptide is eluted from the resin with a solution containing freeNi²⁺ ions.

Example 4 Cellular Production of Target Polypeptides

Nucleic acids are cloned into DNA plasmids having phage recombinationcites and target polypeptides are expressed therefrom in a variety ofhost cells. λ phage genomic DNA contains short sequences known as attPsites, and E. coli genomic DNA contains unique, short sequences known asattB sites. These regions share homology, allowing for integration ofphage DNA into E. coli via directional, site-specific recombinationusing the phage protein Int and the E. coli protein IHF. Integrationproduces two new att sites, L and R, which flank the inserted prophageDNA. Phage excision from E. coli genomic DNA can also be accomplishedusing these two proteins with the addition of a second phage protein,Xis. DNA vectors have been produced where the integration/excisionprocess is modified to allow for the directional integration or excisionof a target DNA fragment into a backbone vector in a rapid in vitroreaction (Gateway™ Technology (Invitrogen, Inc.)).

A first step is to transfer the nucleic acid insert into a shuttlevector that contains attL sites surrounding the negative selection gene,ccdB (e.g. pENTER vector, Invitrogen, Inc.). This transfer process isaccomplished by digesting the nucleic acid from a DNA vector used forsequencing, and to ligate it into the multicloning site of the shuttlevector, which will place it between the two attL sites while removingthe negative selection gene ccdB. A second method is to amplify thenucleic acid by the polymerase chain reaction (PCR) with primerscontaining attB sites. The amplified fragment then is integrated intothe shuttle vector using Int and IHF. A third method is to utilize atopoisomerase-mediated process, in which the nucleic acid is amplifiedvia PCR using gene-specific primers with the 5′ upstream primercontaining an additional CACC sequence (e.g., TOPO® expression kit(Invitrogen, Inc.)). In conjunction with Topoisomerase I, the PCRamplified fragment can be cloned into the shuttle vector via the attLsites in the correct orientation.

Once the nucleic acid is transferred into the shuttle vector, it can becloned into an expression vector having attR sites. Several vectorscontaining attR sites for expression of target polypeptide as a nativepolypeptide, N-fusion polypeptide, and C-fusion polypeptides arecommercially available (e.g., pDEST (Invitrogen, Inc.)), and any vectorcan be converted into an expression vector for receiving a nucleic acidfrom the shuttle vector by introducing an insert having an attR siteflanked by an antibiotic resistant gene for selection using the standardmethods described above. Transfer of the nucleic acid from the shuttlevector is accomplished by directional recombination using Int, IHF, andXis (LR clonase). Then the desired sequence can be transferred to anexpression vector by carrying out a one hour incubation at roomtemperature with Int, IRF, and Xis, a ten minute incubation at 37° C.with proteinase K, transforming bacteria and allowing expression for onehour, and then plating on selective media. Generally, 90% cloningefficiency is achieved by this method. Examples of expression vectorsare pDEST 14 bacterial expression vector with att7 promoter, pDEST 15bacterial expression vector with a T7 promoter and a N-terminal GST tag,pDEST 17 bacterial vector with a T7 promoter and a N-terminalpolyhistidine affinity tag, and pDEST 12.2 mammalian expression vectorwith a CMV promoter and neo resistance gene. These expression vectors orothers like them are transformed or transfected into cells forexpression of the target polypeptide or polypeptide variants. Theseexpression vectors are often transfected, for example, intomurine-transformed a adipocyte cell line 3T3-L1, (ATCC), human embryonickidney cell line 293, and rat cardiomyocyte cell line H9C2.

Modifications may be made to the foregoing without departing from thebasic aspects of the invention. Although the invention has beendescribed in substantial detail with reference to one or more specificembodiments, those of skill in the art will recognize that changes maybe made to the embodiments specifically disclosed in this application,yet these modifications and improvements are within the scope and spiritof the invention, as set forth in the claims which follow.

Citation of the above publications or documents is not intended as anadmission that any of the foregoing is pertinent prior art, nor does itconstitute any admission as to the contents or date of thesepublications or documents. U.S. patents and other publicationsreferenced herein are hereby incorporated by reference.

1. A method for identifying a subject at risk of breast cancer, whichcomprises detecting the presence or absence of one or more polymorphicvariations associated with breast cancer in a nucleic acid sample from asubject, wherein the one or more polymorphic variations are detected ina nucleotide sequence selected from the group consisting of: (a) anucleotide sequence in FIGS. 1A-1B or FIG. 2; (b) a nucleotide sequencewhich encodes a polypeptide encoded by a nucleotide sequence in FIGS.1A-1B or FIG. 2; (c) a nucleotide sequence which encodes a polypeptidethat is 90% or more identical to the amino acid sequence encoded by anucleotide sequence in FIGS. 1A-1B or FIG. 2; (d) a fragment of anucleotide sequence of (a), (b), or (c); whereby the presence of thepolymorphic variation is indicative of the subject being at risk ofbreast cancer.
 2. The method of claim 1, which further comprisesobtaining the nucleic acid sample from the subject.
 3. The method ofclaim 1, wherein the one or more polymorphic variations are detected atone or more chromosome positions in FIGS. 1A-1B.
 4. The method of claim1, wherein the one or more polymorphic variations are detected at one ormore chromosome positions selected from the group consisting of 7817039,60753893, 16192532, 127007287, and
 4731448. 5. The method of claim 4,wherein a polymorphic variation is detected at chromosome position60753893.
 6. The method of claim 4, wherein a polymorphic variation isdetected at chromosome position
 16192532. 7. The method of claim 4,wherein a polymorphic variation is detected at chromosome position4731448.
 8. The method of claim 1, wherein the one or more polymorphicvariations are detected at one or more positions in linkagedisequilibrium with one or more chromosome positions selected from thegroup consisting of 7817039, 60753893, 16192532, 127007287, and 4731448.9. The method of claim 1, wherein detecting the presence or absence ofthe one or more polymorphic variations comprises: hybridizing anoligonucleotide to the nucleic acid sample, wherein the oligonucleotideis complementary to a nucleotide sequence in the nucleic acid andhybridizes to a region adjacent to the polymorphic variation; extendingthe oligonucleotide in the presence of one or more nucleotides, yieldingextension products; and detecting the presence or absence of apolymorphic variation in the extension products.
 10. The method of claim1, wherein the subject is a human.
 11. A method for identifying apolymorphic variation associated with breast cancer proximal to anincident polymorphic variation associated with breast cancer, whichcomprises: identifying a polymorphic variation proximal to the incidentpolymorphic variation associated with breast cancer, wherein thepolymorphic variation is detected in a nucleotide sequence selected fromthe group consisting of: (a) a nucleotide sequence in FIGS. 1A-1B orFIG. 2; (b) a nucleotide sequence which encodes a polypeptide encoded bya nucleotide sequence in FIGS. 1A-1B or FIG. 2; (c) a nucleotidesequence which encodes a polypeptide that is 90% or more identical tothe amino acid sequence encoded by a nucleotide sequence in FIGS. 1A-1Bor FIG. 2; (d) a fragment of a nucleotide sequence of (a), (b), or (c)comprising the polymorphic variation; determining the presence orabsence of an association of the proximal polymorphic variant withbreast cancer.
 12. The method of claim 11, wherein the incidentpolymorphic variation is at a chromosome position listed in FIGS. 1A-1B.13. The method of claim 11, wherein the incident polymorphic variationis at a chromosome position selected from the group consisting of7817039, 60753893, 16192532, 127007287, and
 4731448. 14. The method ofclaim 11, wherein the proximal polymorphic variation is within a regionbetween about 5 kb 5′ of the incident polymorphic variation and about 5kb 3′ of the incident polymorphic variation.
 15. The method of claim 11,which further comprises determining whether the proximal polymorphicvariation is in linkage disequilibrium with the incident polymorphicvariation.
 16. The method of claim 11, which further comprisesidentifying a second polymorphic variation proximal to the identifiedproximal polymorphic variation associated with breast cancer anddetermining if the second proximal polymorphic variation is associatedwith breast cancer.
 17. The method of claim 16, wherein the secondproximal polymorphic variant is within a region between about 5 kb 5′ ofthe incident polymorphic variation and about 5 kb 3′ of the proximalpolymorphic variation associated with breast cancer.
 18. An isolatednucleic acid comprising a nucleotide sequence selected from the groupconsisting of: (a) a nucleotide sequence in FIGS. 1A-1B or FIG. 2; (b) anucleotide sequence which encodes a polypeptide encoded by a nucleotidesequence in FIGS. 1A-1B or FIG. 2; (c) a nucleotide sequence whichencodes a polypeptide that is 90% or more identical to the amino acidsequence encoded by a nucleotide sequence in FIGS. 1A-1B or FIG. 2; (d)a fragment of a nucleotide sequence of (a), (b), or (c); and (e) anucleotide sequence complementary to the nucleotide sequences of (a),(b), (c), or (d); wherein the nucleotide sequence comprises a nucleotideat a chromosome position of FIGS. 1A-1B associated with breast cancer.19. The isolated nucleic acid of claim 18, wherein the nucleotidesequence comprises a guanine at chromosome position 7817039, 60753893,16192532, 127007287, and
 4731448. 20. An oligonucleotide comprising anucleotide sequence complementary to a portion of the nucleotidesequence of (a), (b), (c), or (d) in claim 18, wherein the 3′ end of theoligonucleotide is adjacent to a polymorphic variation associated withbreast cancer.
 21. A microarray comprising an isolated nucleic acid ofclaim 18 linked to a solid support.
 22. An isolated polypeptide encodedby the isolated nucleic acid sequence of claim
 18. 23. A method foridentifying a candidate molecule that modulates cell proliferation,which comprises: (a) introducing a test molecule to a system whichcomprises a nucleic acid comprising a nucleotide sequence selected fromthe group consisting of: (i) a nucleotide sequence in FIGS. 1A-1B orFIG. 2; (ii) a nucleotide sequence which encodes a polypeptide encodedby a nucleotide sequence in FIGS. 1A-1B or FIG. 2; (iii) a nucleotidesequence which encodes a polypeptide that is 90% or more identical tothe amino acid sequence encoded by a nucleotide sequence in FIGS. 1A-1Bor FIG. 2; (iv) a fragment of a nucleotide sequence of (i), (ii), or(iii); or introducing a test molecule to a system which comprises aprotein encoded by a nucleotide sequence of (i), (ii), (iii), or (iv);and (b) determining the presence or absence of an interaction betweenthe test molecule and the nucleic acid or protein, whereby the presenceof an interaction between the test molecule and the nucleic acid orprotein identifies the test molecule as a candidate molecule thatmodulates cell proliferation.
 24. The method of claim 23, wherein thesystem is an animal.
 25. The method of claim 23, wherein the system is acell.
 26. The method of claim 23, wherein the nucleotide sequencecomprises one or more polymorphic variations associated with breastcancer.
 27. The method of claim 26, wherein the one or more polymorphicvariations assocated with breast cancer are at one or more chromosomepositions in FIGS. 1A-1B.
 28. The method of claim 27, wherein thechromosome position is selected from the group consisting of 7817039,60753893, 16192532, 127007287, and
 4731448. 29. A method for treatingbreast cancer in a subject, which comprises administering a candidatemolecule identified by the method of claim 23 to a subject in needthereof, whereby the candidate molecule treats breast cancer in thesubject.
 30. A method for identifying a candidate therapeutic fortreating breast cancer, which comprises: (a) introducing a test moleculeto a system which comprises a nucleic acid comprising a nucleotidesequence selected from the group consisting of: (i) a nucleotidesequence in FIGS. 1A-1B or FIG. 2; (ii) a nucleotide sequence whichencodes a polypeptide encoded by a nucleotide sequence in FIGS. 1A-1B orFIG. 2; (iii) a nucleotide sequence which encodes a polypeptide that is90% or more identical to the amino acid sequence encoded by a nucleotidesequence in FIGS. 1A-1B or FIG. 2; (iv) a fragment of a nucleotidesequence of (i), (ii), or (iii); or introducing a test molecule to asystem which comprises a protein encoded by a nucleotide sequence of(i), (ii), (iii), or (iv); and (b) determining the presence or absenceof an interaction between the test molecule and the nucleic acid orprotein, whereby the presence of an interaction between the testmolecule and the nucleic acid or protein identifies the test molecule asa candidate therapeutic for treating breast cancer.
 31. A method fortreating breast cancer in a subject, which comprises contacting one ormore cells of a subject in need thereof with a nucleic acid, wherein thenucleic acid comprises a nucleotide sequence selected from the groupconsisting of: (a) a nucleotide sequence in FIGS. 1A-1B or FIG. 2; (b) anucleotide sequence which encodes a polypeptide encoded by a nucleotidesequence in FIGS. 1A-1B or FIG. 2; (c) a nucleotide sequence whichencodes a polypeptide that is 90% or more identical to the amino acidsequence encoded by a nucleotide sequence in FIGS. 1A-1B or FIG. 2; (d)a fragment of a nucleotide sequence of (a), (b), or (c); and (e) anucleotide sequence complementary to the nucleotide sequences of (a),(b), (c), or (d); whereby contacting the one or more cells of thesubject with the nucleic acid treats breast cancer in the subject. 32.The method of claim 31, wherein the nucleic acid is RNA or PNA.
 33. Themethod of claim 32, wherein the nucleic acid is duplex RNA.
 34. A methodfor treating breast cancer in a subject, which comprises contacting oneor more cells of a subject in need thereof with a protein, wherein theprotein is encoded by a nucleotide sequence which comprises apolynucleotide sequence selected from the group consisting of: (a) anucleotide sequence in FIGS. 1A-1B or FIG. 2; (b) a nucleotide sequencewhich encodes a polypeptide encoded by a nucleotide sequence in FIGS.1A-1B or FIG. 2; (c) a nucleotide sequence which encodes a polypeptidethat is 90% or more identical to the amino acid sequence encoded by anucleotide sequence in FIGS. 1A-1B or FIG. 2; (d) a fragment of anucleotide sequence of (a), (b), or (c); whereby contacting the one ormore cells of the subject with the protein treats breast cancer in thesubject.
 35. A method for treating breast cancer in a subject, whichcomprises: detecting the presence or absence of one or more polymorphicvariations associated with breast cancer in a nucleic acid sample from asubject, wherein the one or more polymorphic variation are detected in anucleotide sequence selected from the group consisting of: (a) anucleotide sequence in FIGS. 1A-1B or FIG. 2; (b) a nucleotide sequencewhich encodes a polypeptide encoded by a nucleotide sequence in FIGS.1A-1B or FIG. 2; (c) a nucleotide sequence which encodes a polypeptidethat is 90% or more identical to the amino acid sequence encoded by anucleotide sequence in FIGS. 1A-1B or FIG. 2; (d) a fragment of anucleotide sequence of (a), (b), or (c) comprising the polymorphicvariation; and administering a breast cancer treatment to a subject inneed thereof based upon the presence or absence of the one or morepolymorphic variations in the nucleic acid sample.
 36. The method ofclaim 35, wherein the one or more polymorphic variations are detected atone or more chromosome positions in FIGS. 1A-1B.
 37. The method of claim36, wherein the chromosome positions are selected from the groupconsisting of 7817039, 60753893, 16192532, 127007287, and
 4731448. 38.The method of claim 35, which further comprises extracting and analyzinga tissue biopsy sample from the subject.
 39. The method of claim 35,wherein the treatment is chemotherapy, surgery, radiation therapy, andcombinations of the foregoing.
 40. The method of claim 39, wherein thechemotherapy is selected from the group consisting of cyclophosphamide(Cytoxan), methotrexate (Amethopterin, Mexate, Folex), fluorouracil(Fluorouracil, 5-Fu, Adrucil), cyclophosphamide, doxorubicin(Adriamycin), and combinations of the foregoing.
 41. The method of claim40, wherein the combinations are selected from the group consisting ofcyclophosphamide (Cytoxan), methotrexate (Amethopterin, Mexate, Folex),and fluorouracil (Fluorouracil, 5-Fu, Adrucil); cyclophosphamide,doxorubicin (Adriamycin), and fluorouracil; and doxorubicin andcyclophosphamide.
 42. A method for detecting or preventing breast cancerin a subject, which comprises: detecting the presence or absence of oneor more polymorphic variations associated with breast cancer in anucleic acid sample from a subject, wherein the polymorphic variation isdetected in a nucleotide sequence selected from the group consisting of:(a) a nucleotide sequence in FIGS. 1A-1B or FIG. 2; (b) a nucleotidesequence which encodes a polypeptide encoded by a nucleotide sequence inFIGS. 1A-1B or FIG. 2; (c) a nucleotide sequence which encodes apolypeptide that is 90% or more identical to the amino acid sequenceencoded by a nucleotide sequence in FIGS. 1A-1B or FIG. 2; (d) afragment of a nucleotide sequence of (a), (b), or (c) comprising thepolymorphic variation; and administering a breast cancer preventative ordetection procedure to a subject in need thereof based upon the presenceor absence of the one or more polymorphic variations in the nucleic acidsample.
 43. The method of claim 42, wherein the one or more polymorphicvariations are detected at one or more chromosome positions in FIGS.1A-1B.
 44. The method of claim 42, wherein the breast cancer detectionprocedure is selected from the group consisting of a mamography, anearly mamography program, a frequent mamography program, a biopsyprocedure, a breast biopsy and biopsy from another tissue, a breastultrasound and optionally ultrasound analysis of another tissue, breastmagnetic resonance imaging (MRI) and optionally MRI analysis of anothertissue, electrical impedance (T-scan) analysis of breast and optionallyof another tissue, ductal lavage, nuclear medicine analysis (e.g.,scintimammography), BRCA1 and/or BRCA2 sequence analysis results,thermal imaging of the breast and optionally of another tissue, and acombination of the foregoing.
 45. The method of claim 42, wherein thebreast cancer preventative procedure is selected from the groupconsisting of one or more selective hormone receptor modulators, one ormore compositions that prevent production of hormones, one or morehormonal treatments, one or more biologic response modifiers, surgery,and drugs that delay or halt metastasis.
 46. The method of claim 45,wherein the selective hormone receptor modulator is selected from thegroup consisting of tamoxifen, reloxifene, and toremifene; thecomposition that prevents production of hormones is an aramotaseinhibitor selected from the group consisting of exemestane, letrozole,anastrozol, groserelin, and megestrol; the hormonal treatment isselected from the group consisting of goserelin acetate and fulvestrant;the biologic response modifier is an antibody that specifically bindsherceptin/HER2; the surgery is selected from the group consisting oflumpectomy and mastectomy; and the drug that delays or halts metastasisis pamidronate disodium.
 47. A method of targeting information forpreventing or treating breast cancer to a subject in need thereof, whichcomprises: detecting the presence or absence of one or more polymorphicvariations associated with breast cancer in a nucleic acid sample from asubject, wherein the polymorphic variation is detected in a nucleotidesequence selected from the group consisting of: (a) a nucleotidesequence in FIGS. 1A-1B or FIG. 2; (b) a nucleotide sequence whichencodes a polypeptide encoded by a nucleotide sequence in FIGS. 1A-1B orFIG. 2; (c) a nucleotide sequence which encodes a polypeptide that is90% or more identical to the amino acid sequence encoded by a nucleotidesequence in FIGS. 1A-1B or FIG. 2; (d) a fragment of a nucleotidesequence of (a), (b), or (c) comprising the polymorphic variation; anddirecting information for preventing or treating breast cancer to asubject in need thereof based upon the presence or absence of the one ormore polymorphic variations in the nucleic acid sample.
 48. The methodof claim 47, wherein the one or more polymorphic variations are detectedat one or more chromosome positions in FIGS. 1A-1B.
 49. The method ofclaim 47, wherein the information comprises a description of a breastcancer detection procedure, a chemotherapeutic treatment, a surgicaltreatment, a radiation treatment, a preventative treatment of breastcancer, and combinations of the foregoing.
 50. A composition comprisinga breast cancer cell and an antibody that specifically binds to aprotein, polypeptide or peptide encoded by a nucleotide sequenceidentical to or 90% or more identical to a nucleotide sequence in FIGS.1A-1B or FIG.
 2. 51. A composition comprising a breast cancer cell and aRNA, DNA, PNA or ribozyme molecule comprising a nucleotide sequenceidentical to or 90% or more identical to a portion of a nucleotidesequence in FIGS. 1A-1B or FIG.
 2. 52. The composition of claim 51,wherein the RNA molecule is a short inhibitory RNA molecule.